Machine Learning and Innovative Speech Applications Dominate Conversational Interaction Conference
“The electric light did not come about by continuous improvement of the candle.” — Oren Harari, Former business professor, author, and columnist
By using this quote in his presentation, Peter Voss, Aigo.ai, set the tone for Conversational Interaction Conference, held March 11-12, 2019, in San Jose, which emphasized the new AI tool, Machine Learning, and how it has caused voice and text dialog management to leap-frog forward.
The conference emphasized usable insights and advances in speech technologies (automatic speech recognition, text-to-speech synthesis, sentiment extraction, speaker identification) in conjunction with machine learning technologies now available for commercial use in chatbots and virtual assistants. The presentations fell into three general categories:
- innovative applications using the latest speech technologies
- alternative architectures to ease the creation of conversational apps
- best practices for designing speech dialogs
I have included the presenters’ names so you can retrieve additional details from their presentations, which will be posted soon on the AVIOS website (pull down the Public Presentations tag, and select Conversational Interaction 2019).
I was excited about the number and variety of new, innovative conversational dialog applications and how they will change people’s lives. In his keynote lecture, Dr. Michael McTear from Ulster University discussed chatbot applications which provide companionship for the elderly, monitor the elderly at home, and provide support for people in developing countries.
Jesse Montgomery, Theatro, described a wearable device that enables sales employees to not only help access store systems and connect with each other in real-time, but also provide additional training, reminders about best practices, tips about using features seldom accessed and introductions for newly released features.
Heloisa Candello, IBM Research, described Café com os Santiagos, an art installation where visitors converse with three chatbots that portray three characters who discuss different viewpoints in the book. This type of application could provide additional insight into museum exhibits and travel landmarks, as well as all types of art and literature.
John Swansey, AgVoice, described a hands- and eyes-busy application (data collection about milk cows), a hands-busy app (data collection about corn plants), and other voice applications for managing food production. These types of applications will help farmers produce food for the world’s expanding population.
Dee Kanejiya, Cognii, described how a learning assistant captures students’ explanations and descriptions of complex topics, analyzes them, and provides feedback for improvement. This type of application goes beyond the “lecture and quiz” of many of today’s educational applications.
Peter Voss identified three architectural waves for speech applications: writing rules; using AI and machine learning to extract information from large amounts of labeled data; applying a cognitive architecture that is modeled after the human brain. Voss emphasized that there are many other fields within AI that could improve interaction between humans and machines, including technologies that remember, learn, understand, reason, and personalize. Several university prototypes and a few commercial cognitive products are available today.
It was also clear that conversational systems now have to support a variety of languages. Lisa Michaud, Aspect Software, described an interlingua architecture where dialog logic, intent classification, and slot-filling are language-independent which empowered a design approach that builds for multiple language groups in a single design effort. This makes it easier to implement multiple languages rather than adding new languages to an existing system.
Anish Mathur, IBM Watson, suggested that conversational engines should work together with AI search engines. If the conversational engine is unable to answer the user’s request, then the request should be passed on to an AI search engine to obtain additional knowledge about the request.
Best Practices for Designing Dialogs
Wolf Paulus, Intuit, recommends using emotion recognition and emotion synthesis in dialogs. The user’s emotional state is detected by observing user behavior (voice, photo/video, touch, click) in real time. The system responds using the appropriate emotion. Even we humans have difficulty detecting and responding to emotions of others, so we will need extensive research to develop definitions and guidelines for detecting and synthesizing human emotions.
Brielle Nickoloff, Witlingo, listed ten lessons she and her team have learned working with clients while developing voice-only skills. Nickoloff encourages every team manager who develops commercial speech applications to review her ten suggestions for designing and launching voice applications.
Bill Meizel, TMA Associates and organizer of the conference, summed things up: “The Conversational Interaction Conference showed the wave of creativity and investment in products and services made possible by language technologies having passed the ‘tipping point’ of utility. Companies are clearly finding many ways that this rapidly developing capability can make our connection with digital systems tighter and more intuitive.”
Screen- and voice-oriented devices are becoming one and the same
At the 2019 SpeechTEK Conference, Wolf Paulus, Principal Engineer, Technology Futures, Intuit and University of California, Irvine will be exploring "The Engineering of Emotion." Conference Chair Jim Larson interviewed Paulus to get a sneak peek at the session and explore the world of sentiment analysis.