Speech Technology’s Continually Evolving Confidence Game

Article Featured Image

The proliferation of voice-first devices—ranging from smart speakers and appliances to “infotainment” consoles in cars—have given new life to speech technologies and voice user interfaces. It coincides with accelerating advancements in so-called cognitive computing, a term popularized by IBM when it unleashed Watson to recognize patterns and solve intractable problems in healthcare, telecommunications, financial services, retailing, education, e-government, and other verticals.

Watson applies high-powered computing resources to recognize patterns, understand what they mean, identify appropriate responses, and anticipate the next best actions. Sound familiar? It should for every veteran of the speech-enablement and VUI design movement that has spanned the past two or three decades. But instead of relying on the magic of the latest high-powered computing platforms, the older solutions relied on rules-based algorithms and probabilistic models that assigned a score to the level of certainty that an automated speech processing resource had correctly recognized what a caller had said and was responding with a correct answer or action.

It was, and is, a confidence game, in a literal sense. 

If I were to assign a mission statement for the community of intelligent assistance solutions providers, it would be this: “to recognize each individual’s intent and provide correct answers or actions consistently and at scale.” (Write that down.) As they compete for business and grow, each solutions provider describes its proprietary approach to solving the complex challenges involved in accomplishing the same overall mission.

It is still very early days for the flavors of intelligent assistants that rely on cognitive solutions, machine learning, spontaneous content generation, and human-like conversation models that embrace emotion and sentiment analysis and empathy. They are coming. But in the meantime, there is a lot of life and business value to be found in taking a hybrid approach that cherry-picks from available technologies that can perform categorization (to recognize the purpose of a call) and disambiguation (to make sense of confusing input) to support understanding.

Existing “recognizers” are up to the task of accurately recognizing spoken words. Nuance, Amazon, Apple, Google, and Microsoft have all reported accuracy rates that routinely exceed 95%. Statistical language models (SLMs) and the Hidden Markov Model, when coupled with improvements in the performance of new microphones and electronics for capturing audio, see to it that the quality of raw material for recognizing speech and understanding intent is worthy of high confidence.

Next up comes training, learning, and understanding. Here, again, new technologies that involve cognitive resources and deep learning can speed the time it takes to get a conversational intelligent assistant up and running. Yet performance improves greatly thanks to human input, including the ability to turn to subject matter experts, including the best customer service agents, to help determine the best answers or actions to take.

As companies and brands evaluate their options for bringing on intelligent assistants as digital employees in their customer care or sales support contact centers, they should pay attention to results and return on investment. Banks, diversified telecoms, insurance, travel, retailing, and hospitality are among the industries and use cases where neophyte assistants can benefit from all of the customer conversations that have come before. Scripts, decision trees, and SLMs for speech-enabled interactive voice response systems can join the content of chat transcripts, marketing literature, training materials, and human feedback as raw material and metadata to inform automated systems.

Incidentally, the same idea applies as companies hatch plans to extend their intelligent assistant resources across all digital channels, spanning both voice and text, involving smart speakers, message bots, and bots in cars. There’s a direct, evolutionary path from speech-enabled IVRs to JARVIS-like metabots. You just have to have confidence. 

Dan Miller is founder of and the lead analyst for Opus Research.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Security Threats to Speech Apps in the Age of Deep Fakes

As machine learning marches forward, we are becoming aware of the dangers of "deep fakes" generated by deep learning algorithms. Not surprisingly, machines can be both the problem and the solution.