May 1, 2012
By David Myron Editorial Director - Speech Technology Magazine
Editor's Letter

Building Fluidity into Machines

There's certainly a recurring theme in this issue of Speech Technology. When it comes to customer engagements, organizations should strive for fluidity in the interaction and between communication channels.

Starting with the interaction, many complexitie exist when trying to make machines speak with the same fluidity as humans. Today, speech technologists rely on natural language understanding (NLU) technology to accomplish this goal. Our cover story by Staff Writer Michele Masterson, How Natural Is NLU?, delves into the latest advances and expectations of NLU. It’s come a long way in recent years. Apple has proven this with the release of its speech-enabled digital assistant application, Siri, on the iPhone 4S. Not to be outdone, IBM flexed its speech technology muscles when Watson, the company’s talking machine, used NLU to beat two Jeopardy! game-show champions.

While these technologies have piqued many consumers' interest, there's certainly a lot of room for improvement. In fact, in his recent book, The Voice in the Machine, Roberto Pieraccini aptly points out the challenges of modeling speech-enabled machines after human speech—something we don't yet fully understand. He suggests that comparing talking machines to humans is analogous to comparing airplanes to eagles. To see his explanation, read The Great Divide, which we've excerpted from his book. His perspective will help speech technology buyers and builders get the most out of speech technology by setting appropriate expectations.

Despite its limitations, however, the technology continues to mature. Recently, for example, there have been some developments in emotion detection. During the natural course of a conversation, it's common for someone to get emotional about something, especially on a customer service call. Speech-enabled machines haven't traditionally been very good at picking up on these emotional cues from callers, which can add to a caller's frustration. Fortunately, the W3C Multimodal Interaction Working Group is making progress with the introduction of the first draft of the EMMA 1.1 specification in February. Updates include support for human annotation and better integration with Emotion Markup Language. Not only will emotion detection improve customer interactions, but the specification is enabling organizations to detect emotions over multiple modalities and channels, which will help them better use newer communication channels.

This brings me to my second point about fluidity. There's no denying that customers are using multiple channels when communicating with companies. Sue Ellen Reager's column, Blasts Heard Round the Globe, points out that text analytics and market research company Decooda can send out personalized voice messages to prospects and customers across multiple channels. Clearly, multimodal and multichannel technologies have moved out of the hype cycle and into practice. Our feature story, Creating Fluidity Between Channels, by Michele Masterson, underscores this shift and the need for multichannel customer support. Essentially, it's not enough to make the interaction as fluid as possible; organizations must also enable customers to move across channels freely without data loss. It's an area that will undoubtedly receive a lot more attention moving forward.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Building Fluidity into Machines

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API