February 17, 2020
By James A. Larson program co-chair, SpeechTEK 2021
Speech Technology News

Technology Is Making Dialogues More Life-Like, Conversational Interaction Speakers Note

Improved simulation of human voices, enhanced dialogue structures, and advanced analysis, design, and construction of user experiences are dominating the voice market, speakers confirmed at this year's Conversational Interaction Conference, held Feb. 10-11 in San Jose, Calif.

These advancements enable the creation of new types of applications and improvements of existing applications, speakers said.

In his keynote, Xuedong Huang, technical fellow and chief speech scientist at Microsoft, painted a picture of how new technology can be used to improve interaction among multiple individuals either co-located or remotely distributed.

While speaking in Scottish-English with a Chinese accent, Huang's presentation was transcribed and displayed in real time at the bottom of the presentation screen. He demonstrated not only transcription, but real-time translation to any of several languages.

Next Huang showed videos of goggles that overlaid virtual-reality images over real-life scenes to provide a step-by-step training guide in the language of the user's choice. Then, he added an image of the speaker to the mix. These capabilities can transport speakers across distances and times to multiple participants.

In addition to the ability to synthesize the voice and image of real people, this technology could be misused to present realistic fake news, he warned, pointing out that the speech community must work with regulatory agencies to determine how to best control the use of these new technologies to prohibit their misuse. Huang assured the audience that Microsoft will only sell this technology to responsible parties.

There are, however, many situations where pictures, text, and graphics can enhance and simplify verbal dialogues. While speech is appropriate for speaking commands and asking questions, long menus and complex results are best presented visually. Thus, visual displays have been added to smart speakers. Displays also support video calls. Guidelines for when to speak and when to display are beginning to emerge. A compelling example of this was demonstrated by Victoria Livschitz, founder and chief technology officer at Grid Dynamics. She recommended enhancing verbal-only shopping dialogues by replacing verbal descriptions of flowers with pictures. I anticipate that many voice-only dialogues can be enhanced with visual components. This could be a trend away from voice-only toward multimodal, except for in-car use and other situations where eyes are truly busy.

Bill Meisel, president of TMA Associates and a conference organizer, predicted that just as websites and smartphone apps became critical channels between customers and companies, company-specific digital assistants will use new technologies, such as artifical intelligence and conversational interaction, to improve conversations. While some of us prefer using voice, younger customers might prefer text-based conversations.</p/>

Phil Gray executive vice president of business development at Interactions, suggested, "Chatbots are the new websites." Newer technologies, he said, will pave the way for new company/customer channels, such as virtual- and augmented-reality.

In response to a user query, a Google search returns descriptions of multiple documents, so the user must retrieve and select documents to find the answer to his query. More recently, Google also returns a one- or two-sentence answer to the user query. Praful Krishna, CEO of Coseer, suggested that the answer to users' requests should be highlighted within documents that contain them. Thus, the user can directly see answers in context of the of the retrieved documents.

The current state of conversational design and implementation is like the Wild West with incompatible approaches, tools, and platforms. Each vendor tries to make its platforms more desirable by adding new features. Deborah Dahl, principal at Conversational Technologies, observed that standards are needed to create a level playing field and improve portability and interoperability. She enumerated several standards groups that are tackling interoperability and cross-platform development.

But in the end, the key takeaway was that in the rapidly changing environment, new design strategies, tools, and technologies are merging to improve existing applications and enable new and exciting applications.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Technology Is Making Dialogues More Life-Like, Conversational Interaction Speakers Note

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Soul App Launches Full-Duplex Voice Model

Mistral Unveils Voxtral Open-Source AI Voice Model

Leena AI Launches Agentic AI Colleagues