August 19, 2014
By Leonard Klie Editor, Speech Technology and CRM magazines
Features

SpeechTEK Speakers Call for Conversational Technologies

NEW YORK—While there have been no major breakthroughs in the industry in a few years, speech technology vendors are making small advances in updating speech systems to be more conversational, adaptive, and natural, speakers maintained during an early morning panel at the SpeechTEK conference Monday.

At IBM, for example, much of the current focus is not just on speech recognition but on "technologies that make machines act more like us," said David Nahamoo, speech chief technology officer at IBM Research.

Speech recognition, Nahamoo said, has benefited from recent advances in neural networking and machine learning, but still has a way to go before it can mimic true dialogues.

Neural networking and Web technologies have definitely sped up the pace of innovation and improvements in speech recognition accuracy, according to Nahamoo.

But where speech continues to come up short, he added, is in the ability to conduct a real dialogue. "There's been no real progress on this in thirty years," he stated. "It's been stuck in the mud for a while."

Dan Miller, founder and senior analyst at Opus Research, also sees the need for speech to become more conversational. This, he said, is "his vision" for the technology.

While advances have taken place in semantics, natural language understanding, and even artificial intelligence, making speech more natural and adaptive has been elusive, according to Miller.

The biggest change needed, he added, is not in quality but in "how fluid it is and how easy it is to interact with."

Roberto Pieraccini, director of advanced communication technologies at Jibo, said another technology on the horizon blends speech with cameras and facial recognition to allow systems to read lips as a way to improve speech recognition accuracy.

In an afternoon session, Nandini Stocker, senior voice user interface (VUI) designer at Flare Design, pointed out that speech will always have difficulties mirroring human-to-human conversations because computers cannot pick up the nonverbal turn-taking cues that people give off when engaging with others.

There's a real art and science to designing interactive voice response (IVR) systems that get the timing right between questions, she said. And then it takes a lot of trial and error to provide the right amount of time for the caller to respond to a question in the IVR, she added.

Even then, it's hard for a system to anticipate all the ways that customers can respond, and simple things like repeating an answer can throw the system into error mode, Stocker said.

VUI designers can help the system by turning barge-in on and off as appropriate and giving prompts that guide how the caller responds, she added, but the most important step is for the designer "to be clear about what you're looking for."

To that end, Stocker strongly urged session attendees to include statements that tell the caller how to answer a prompt. With a bank IVR, for example, a prompt that asks the caller the reason for his call could end with "You can say account balance or make a payment."

Also key to making speech systems more conversational will be incorporating emotion detection, Nahamoo contended. This involves not just identifying the emotional state of a caller to an IVR, for example, but how the system responds as well. The emotional state portrayed needs to be appropriate to the type of application, the company, and other factors, he and other speakers maintained.

Speech will also need to become more multichannel and multimodal, particularly taking advantage of advances in mobile technology, added Tom Schalk, vice president of voice technology at satellite radio provider Sirius XM. "With the use of mobile, the use of voice has gone up as well," he stated.

In that regard, the opportunities for more virtual assistant applications such as Apple's Siri and Microsoft's Cortana will expand greatly, Schalk argued.

And when it does, authentication will also need to be multifactor, blending technologies beyond just the basic speaker identification and voice biometrics, according to Bernie Brafman, vice president of marketing at Sensory. Sensory's TrulySecure solution, for example, merges speech with facial recognition to improve authentication for mobile devices, Brafman said.

SpeechTEK Speakers Call for Conversational Technologies

Omilia Launches Lexis TTS Model for Contact Centers

Callie Care Collects $500K for Voice AI Development

AI Voice Agents Increase Specialty Care Program Enrollment

Study Proves Assistive Technologies Improve Users' Lives

Symend Launches SymendConverse

Sunoh.ai Enhances Home-Based Primary Care and Operational Efficiency at Bloom Healthcare

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

Voiskey Officially Launches

VoicePing Releases VoicePing 3.0

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Canary Speech Partners with NeuroLexIQ