January 9, 2004
By Michael Cohen Manager, Speech Technology Group - Google, Inc.
Features

Learning from our Successes

In the past decade, there has been an explosion in the creation and commercial deployment of voice user interfaces (VUIs), especially for use over the telephone. At the beginning of this “decade of growth” (starting around 1994), the biggest obstacle that needed to be overcome was skepticism about the capabilities of the technology. Within a few years, skepticism dissipated as millions of phone calls every day were being handled successfully by speech technology. Although technology improvement will continue to play a key role in providing better experiences for end users, increased business value to enterprises and new capabilities enabling new types of applications, it is no longer the key bottleneck to growth of the speech industry. The biggest challenge now is the design of the user interface. There are too few practitioners who have the knowledge and skills to create all the systems needed and to advance our understanding as new technology enables new capabilities. More importantly, there is still much to learn. Significant progress has been made, advancing both the art and the science of VUI design. However, to realize the great potential of our industry, we must accelerate that progress … but how? We often hear the adage “learn from our mistakes.” However, an even more important principle, too often neglected, is “learn from our successes.” It is far more efficient to repeat a successful pattern than simply to eliminate one more unsuccessful pattern from the infinite array of possibilities. As we look to advance the practice of VUI design, where can we turn for examples of successes at similar endeavors? The obvious place to start is the dramatic example of the advances in core speech technologies that made our current industry possible. There are at least three important lessons that can be drawn from our successes at advancing speech technology. Focus on the Fundamentals
For many years, the dominating focus of the speech recognition research community was on accuracy. Until a base level of accuracy was achieved with the technology, the emergence of a speech industry was inconceivable. It is still the case today that both task completion and end-user satisfaction are highly correlated with accuracy. As users become more mobile and demand more accommodating and powerful applications, accuracy improvements will continue to be crucial. What are the underlying fundamentals for effective VUI design on which we need to focus? First, there is a need to achieve a deeper understanding of the cognitive challenges faced by our callers. Many design decisions depend on an understanding of the limitations of short-term memory and the ability of callers to learn new concepts and attend to important information. Although all user interfaces pose cognitive challenges, speech interfaces create special cognitive challenges given the transient or non-persistent nature of speech (you hear it, and then it is gone). Next, we must understand better the role of linguistic and conversational context. Humans share many conversational conventions, assumptions and expectations that support spoken communications. Although largely unconscious, these shared expectations are key to effective communication, whether human-to-human or human-to-machine. Most other types of user interfaces depend on specific learned actions designed to accomplish the task at hand (e.g., choosing operations from a toolbar or dragging and dropping icons). VUIs depend on the user’s innate spoken language skills. As designers, we don’t get to create the underlying elements of conversation. Therefore, the VUI designer must work on the user’s terms, with an understanding of the user’s conversational conventions. We should focus on the study of user cognitive challenges and the elucidation of linguistic and conversational conventions just as much as that has been applied to speech recognition accuracy. Apply Rigorous Methodology
This is certainly true at the level of individual applications – experience has shown the key role of a well-defined design methodology and rigorous end-user testing. However, it is also true if we want to advance our basic understanding of VUI design. The speech technology research community has a culture of careful scientific experimentation and analysis of results. DARPA (Defense Advanced Research Projects Agency) has enforced a rigorous methodology for testing and comparing results of different technical approaches tried at different research centers. Many of the important advances of the last 20 years are attributable to this requirement for stringent analysis and testing. It will require equally rigorous work to advance our understanding of VUI design issues. Luckily, for network-resident applications, we have the advantage of access to huge amounts of in-service data – real users performing real tasks with real systems. No other user interface modality has the advantage of such a wealth of real usage data. Not even the clickstream data available from Web applications approaches the richness of the speech data available to us. This treasure is wasted if we fail to apply rigorous methods. Create a Community of Researchers and Practitioners
The role of the DARPA program in providing a forum for exchange and comparison of ideas, in the form of regular workshops and associated publications, was key to the advancement of speech technology. When it comes to the issues behind real-world deployment of systems based on a voice user interface, a key source of information and a forum for the exchange of ideas has been the AVIOS/SpeechTEK Conference and the International Journal of Speech Technology. AVIOS is a 22-year old professional membership organization, dedicated to providing resources to the speech community that will help create quality applications of speech technology. Join us at www.AVIOS.org.

Dr. Michael Cohen is cofounder of Nuance Communications and serves on the board of directors of AVIOS. Some of the material in this article is drawn from Voice User Interface Design (2004, Addison-Wesley), by Cohen, Giangola, and Balogh, with permission. (see www.awprofessional.com/titles/0321185765 for ordering info).

Learning from our Successes

Deepdub Partners with Wonderful

Ramco Introduces Chia Conversational AI Agent

Ubie Partners with Mayo Clinic on a Voice-Enabled Healthcare Digital Front Door

DeepL Launches on AWS Marketplace