Anatomy of an AI-Powered Voice Assistant

Article Featured Image

"What can I help you with today?

With modern advances in conversational artificial intelligence, consumers might be surprised to know that the first voice heard on the other end of a line often is digitally created. Customers, whether in the know or not, expect accurate and useful information in real time from a voice that sounds more human than digital.

We now live in an age of AI-powered super-assistants, with virtual services provided across such industries as finance, telecommunications, internet service providers, and health care. Consumers are interacting with voice assistants and chatbots that recognize speech and text inputs and respond quickly in an expressive and near-emotive way.

While conversational AI is on the rise, according to Gartner, more than 90 percent of consumers avoid engaging with businesses' virtual support agents. That leaves the investment largely unused. Many organizations are challenged to build and customize voice assistants that can meet such high expectations. Consumers complain that many such voice assistants sound unnatural, are slow, and are unable to understand context.

Some of the limitations consumers cite can be resolved largely through AI advancements by reducing latency, better understanding colloquial jargon, and increasing the ability to manage several topics at once.

Characteristics of a Good AI-powered Voice Assistant

Good AI-powered voice assistants today should be able to do the following:

  • Respond in milliseconds: One of the barriers to the adoption of chatbots and virtual assistants is that human conversations are contextual and the typical gap between responses in natural conversation is around 300 milliseconds. For AI to replicate human-like interaction, it might need to run up to 20-30 neural networks in sequence to generate an intelligent response, all within 300 milliseconds or less.
  • Understand domain jargon: For AI to provide better customer satisfaction, the pretrained models must be retrained to understand industry-specific jargon. Companies can use zero-coding tools within Riva for fine-tuning models on their custom data.
  • Recognize multiple contexts: In general, human conversations involve people taking turns, sometimes rapidly, and are highly contextual in nature, involving several topics in a single conversation. Companies can use dialogue managers in their AI assistants to remember the conversation’s state and flow to facilitate human-like discussions.

To understand how conversational AI will develop in the future, it is critical to look at current business needs. The global COVID-19 pandemic has upended whole industries. Supply chains continue to be disrupted. Meanwhile, consumers increasingly are opting to purchase online, and worker shortages in call centers, warehouses, and across the trucking industry are exacerbating the push to rebalance demand with supply.

That has accelerated the shift toward AI solutions and automation. Businesses like Ping An, China's largest insurer, are working to enhance customer experiences by reducing wait times and providing authentic and accurate customer service via voice-based or text-based virtual agents. Using NVIDIA's Riva conversational AI software development kit, companies can build real-time speech applications that are constantly improving in accuracy.

Global companies also will need to customize for each market, with voice assistants for specific languages and domains. InstaDeep, a provider of decision-making AI products for companies, used Riva to reduce its algorithm's training time for Arabic-language markets from days to hours.

No one is sure when, or even if, businesses will see a return to pre-pandemic normalcy, and businesses can't wait to find out. One thing is certain: 2022 breakthroughs and new developments will continue to push the boundaries of what's possible, and conversational AI will be one of the leading technologies to radically evolve how businesses interact with their partners and customers.

Gartner recently predicted that by 2025, advanced virtual assistants will provide advisory and intervention roles for 30 percentof knowledge workers, up from 5 percent in 2021. Much of this expected growth comes from the growing number of millennials who will be entering the workforce. Because chatbots cater to millennials' demand for instant, digital connections that keep them up to date at all times, millennials will likely have a large impact on how well and how quickly organizations adopt the technology.

We can expect that with these growing business needs reaching new levels of urgency, the search for conversational AI software that allows for customization and rapid scaling will become all the more important.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues