Will Unified Messaging be the Beachhead Opportunity for Conversational Voice User Interfaces?

That speech technologies represent a market poised for tremendous growth is scarcely subject to debate. The precise form that the emerging market will take is still, however, somewhat unknown. Some believe that speech application users will demand essentially unrestricted conversational user interfaces. But is this belief supported by the facts? To date, several "Virtual Assistant" products have been fielded which tout "conversational dialog" as the basis for user interaction. Typically, their interfaces pose an apparently open-ended question to the user such as, "How may I help you?" Users are expected to respond to these prompts in highly variable ways and the systems are supposed to be able to respond accordingly. Unfortunately, these Virtual Assistant systems have enjoyed only marginal commercial success. One assumes that their lackluster commercial performance is directly representative of a failure in user acceptance. If it is the case that users have shied away from these conversational dialog systems, why might this be so? I would say that the answer is not absolutely clear. On the other hand, I strongly suspect that the problem lurks in the relatively large gap between the expectations that such conversational dialog interfaces engender and the performance that they can reliably deliver. Human dialog is simply too unpredictable to be comprehensively captured in pre-designed and ultimately state-specific grammars. This invariably leads to out-of-grammar user utterances, which invariably lead to recognition failure recovery prompts. Recurrent recognition failures are the fastest way to ensure user frustration and to encourage users to quickly adopt strategies to avoid the experience. Typically, users learn the most reliable path to successfully complete the tasks they wish to perform. The opportunity to emit their requests in thousands (if not millions) of ways becomes completely irrelevant in that their utterances tend to become telegraphic. For example, rather than risk a recognition failure upon saying, "Gee, I think I'd like you to tell me about the first item on my to-do list for tomorrow morning", the experienced user might be more likely to say, "Read to-do list tomorrow." Yet a crucial factor in the limited success of the conversational interface has to do with the person or entity making the buying decision. Most of the Virtual Assistant products were sold to individuals as services. Thus, it was the individual user who the vendor had to please. In the Unified Messaging space, this is not true. Users do not make the buying decision: their company makes it. Usage of the unified messaging system may even become a requirement of continued employment, regardless of its shortcomings or the personal frustrations encountered from its use. Many, if not most companies, currently have some sort of DTMF based voice mail system. My experience is that most users do not particularly like interacting with these systems, but they learn the keystrokes and shortcuts that get them where they need to go and get on with their busy days. Replacing these systems presents a unique opportunity for the conversational voice user interface. Given the captive nature of the user class, user motivation will have greater immunity to the experiences that can discourage the individual user. More importantly, users initially are likely to perceive a voice user interface more favorably than any DTMF alternative, regardless of its possible shortcomings. If UM dialog designers learn the habits of their successful users and alter their grammars in ways to promote more effective interactions among the greater user population, their systems are likely to be well received. But if usage tends to discourage greater conversational interaction, how will UM promote the conversational dialog model? By keeping the user engaged long enough to learn his collective conversational requirements. It may well be the case that UM designers initially discover the minimalist requirements for their user's interactions. But by carefully observing the behavior of their user class, more elaborate fault tolerant conversational systems will soon follow. Unified Messaging has itself been a market of false starts and unimpressive growth. After making an address at a speech industry conference last year on the status of the market, I was asked by an investment analyst when I thought the UM market might actually take off. My answer then, which I stand by now, is that UM has, to date, lacked a simple, universal user interface. This represents a unique and promising opportunity in the evolution of conversational voice interfaces. And successes in the UM space will foster advances in the overall practice.

Dr. Walter Rolandi is the founder and owner of The Voice User Interface Company in Columbia, SC. Dr. Rolandi provides consultative services in the design, development and evaluation of telephony based voice user interfaces (VUI) and evaluates ASR, TTS and conversational dialog technologies.

