The 2023 State of Intelligent Virtual Assistants

Article Featured Image

Mixed messaging might be the best way to describe the intelligent virtual assistant (IVA) market in 2022. Companies are installing these solutions in growing numbers. But customer satisfaction with this input option continues to be tepid, and building these applications systems is challenging.

IVA suppliers have been fine-tuning their development and deployment approach as the technology has evolved. These systems were introduced years ago with a great deal of pomp and circumstance. However, the initial wave of general-purpose IVAs produced mediocre results for a variety of reasons.

Year in Review

Businesses have had to build applications from the ground up. “A lot of DIY chatbots have fallen short of customer expectations,” says Yan Zhang, COO of Poly AI, a supplier of conversational artificial intelligence solutions for customer service.

One challenge with voice input is the wide range of ways that individuals express themselves. Systems struggle to address open-ended questions, like “How can I help you?”

Even straightforward input, like a customer’s personal data, can be taxing. “Many IVAs struggle to recognize names, like ‘Andy Phillips’ and ‘Yan Jung,’” Zhang says candidly.

Compounding the problem, many companies took web textual information and simply added voice interfaces. Web interfaces constrain input, so users have few choices.

In addition, voice requires developers with special skills, like working with AI models and designing voice user interfaces. These areas are new, so most IT staff have limited expertise in them.

Integration has been another impediment. Like data software, speech requires a full stack of features to function. To date, these systems have been largely proprietary. In some cases, the vendor does a good job with one piece, say, voice recognition, but is lacking in another.

As a result, organizations have struggled to clear the many hurdles and, in a number of cases, deployed mediocre solutions. Consequently, customers often find the systems more annoying than helpful. In fact, 32 percent of consumers responding to a Verint survey said they rarely or never feel understood by chatbots.

The pushback has forced companies to rethink their deployments. “A number of bots were hastily deployed during the pandemic,” notes Heather Richards, vice president of go-to-market strategy for digital-first engagement at Verint. “As new economic realities set in, businesses were forced to prove that the solutions worked as promised.”

Increasingly aware of the problems, suppliers changed their approaches. Vendors narrowed their focus, and contact centers have been a logical place to start. Here, agents deal primarily with voice technology and not text.

Furthermore, the business case often is easy to establish. “The contact center represents the largest employee expense within U.S. companies,” Zhang states. Automating processes reduces personnel expenses and boosts productivity.

Microsoft wove IVA capabilities into its system and application infrastructure in April. Enterprises use Windows Conversational Agent APIs to provide more self-service options. Voice agent applications are activated by spoken keywords and enable a hands-free, voice-driven experience.

Multichannel support is another area of emphasis. In December, Zoho enhanced its Desk solution, so interactions toggle between chatbots and live agents in blended conversations.

Tuning AI data models is another ongoing challenge. Interactive Media improved its IVA so it better understands what to expect from the next utterance. For instance, if a user provides a numerical code, the service deduces that the next utterance will most likely also be composed of numbers, and it then uses the speech-to-text engine that best recognizes such input, improving and speeding up response times.

Progress has occurred. Around 90 percent of companies surveyed by Deloitte realized faster complaint resolution, and more than 80 percent reported increased call volume processing using conversational AI solutions.

A Look Ahead

Best practices are emerging. “The contact center speech projects that have been the most successful start small and have a tight focus,” says Max Ball, a principal analyst at Forrester Research. Ironically, the COVID-19 pandemic forced many companies to take this tact. When the crisis struck, call volume rose to unprecedented levels and customer satisfaction dropped. Businesses were forced to quickly find ways to offload routine input from harried agents to machines.

Speech technology has had success in a few market verticals. “Travel, financial services, and healthcare have been areas where companies have been adopting conversational AI,” Verint’s Richards points out.

As a result, the technology has been gaining traction. The volume of interactions handled by conversational agents increased by as much as 250 percent in multiple industries, according to Deloitte’s research. The end result: The global conversational AI market is expected to grow at a compound annual growth rate of 22 percent through 2025, when it is expected to reach about $14 billion.

So what looms on the horizon? Macroeconomic issues might help the IVA market continue its deployment momentum in 2023. “Companies are under pressure to lower costs, and IVA automation offers them one way to reach that goal,” Forrester’s Ball anticipates. “At the end of 2022, we received a growing number of inquiries focused on best practices for IVA automation in the contact center.”

In addition, work is under way to make these solutions easier to deploy. Customers would like the products to be more modular and more similar to data application development, where creators can freely mix and match pieces from multiple vendors.

To attain that objective, industry standards are needed. Stanford University has been at the forefront of developing IVA standards, having assessed that commercial chatbots today are brittle because they are hardcoded to handle only a few possible user inputs. Recently, large language models, such as GPT-3, emerged. They are remarkably fluent but often produce erroneous results.

The university researchers created a development language called GenieScript to address these issues. The approach tries to combine the best of deep learning and programming systems, so businesses create better IVAs. GenieScript’s design goal is to empower non-AI developers to quickly script high-level conversational flows. Genie hides the complexities common in natural language processing development.

The language has a number of components:

  • ThingTalk: a formal, expressive, executable meaning representation for task-oriented dialogues;
  • contextual neural semantic parser for translating dialogues into ThingTalk;
  • automatic generation of training data for dialogues from high-level schema and API specification with the help of LLMs;
  • multilingual, mixed-initiative, multimodal assistants; and
  • federated privacy-protecting assistants.

As the basic framework takes shape, the group is working on extending it to address these five other challenging areas:

Correctness. Neural semantic parsers today have been built by fine-tuning language models such as BART. Semantic parsing is useful for grounding LLMs, allowing them to answer factual questions correctly by querying external knowledge bases. They are trying to combine and enhance the two approaches.

Self-learning. Typically, IVAs follow single commands and reject inputs they fail to understand. The group wants assistants to learn from their mistakes and lower the rejection numbers.

Human values. The traditional approach to teach LLM human values (anti-toxicity, elimination of stereotypical biases, promotion of prosocial behavior) is to fine-tune LLMs with hand-annotated data. The group wants to add thoughtfulness to the output, so IVAs mimic human emotions, like empathy.

Social intelligence. To be effective as a communicator, the assistant must also have social intelligence, which includes good conversational skills, knowledge of social norms, listening skills, attuning to others’ feelings, social self-efficacy, and impression management skills.

Multimodal assistants. The future assistants will naturally be multimodal, making the most of audio and graphical user interfaces. The group is creating a development framework that will automatically support mixed-mode operations with only a minor increase in programming complexity.

Other de facto and ad hoc standards groups are tackling voice development issues from different perspectives. Some of the work overlaps.

Amazon, for example, is driving interoperability mainly among smart home devices. The Amazon Voice Interoperability Initiative (VII) enables automated switching between two voice agents. The system transfers requests it cannot fulfill to another service.

A major announcement in 2022 came from OpenAI, an AI research and deployment company funded in large part by Elon Musk and other investors. This ambitious initiative combined open-source development and more than $1 billion in funding to create ChatGPT. This large language model can be used for natural language processing tasks and focuses on the following four areas:

Text generation. ChatGPT generates humanlike text responses to prompts. Possible uses are customer service chatbots, automated responses to questions posted in online forums, or creating personalized content for social media.

Language translation. Here, users provide text prompts in one language and specify the target language, and the model translates the text.

Text summarization of long documents or articles.

Sentiment analysis to help users understand the overall tone and emotion of a piece. One use case is customer service, where it can detect customer reactions to particular content.

The various standards are just taking shape. The specifications are being polished, so prototyping and interoperability testing can begin. How much overlap and which ones gain acceptance will eventually become clear.

Paul Korzeniowski is a freelance writer who specializes in technology issues. He has been covering speech technology issues for more than two decades, is based in Sudbury, Mass., and can be reached at paulkorzen@aol.com or on Twitter @PaulKorzeniowski.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues