At the recent SpeechTEK conference, a group of VUI specialists spent a day defining a set of success criteria for speech-enabled user interfaces. We necessarily limited our focus to criteria for which we could define metrics for measuring each quality. However, this leaves out one vital factor for the success of a VUI: the role of expectation. I’m referring to the expectations that users have when they interact with a speech-enabled application, the expectations that organizations have when deploying new speech technology, and even the expectations we purveyors of speech technology bring into the mix. Expectations are by definition soft — they are based on previous experiences with speech and other technologies, and are expressed as attitudes, preferences and assumptions — all things that are notoriously hard to measure. What we believe about speech technology has a huge impact on how successful a speech project will be.
Users’ expectations are one of the most powerful factors in determining the way they approach an interaction with a speech system. The complication here is that many users don’t have any strong expectations of speech technology. Expectation is built on experience and most users have very little, meaning that they’re trying to figure out the “rules of engagement” as they go along. Most users don’t know exactly what to say or how they should say it to a speech system. This is why accurately capturing the nuances of spoken conversation in the dialogue structure and prompting is so vital. If you simply pose the right questions—and ask them in a way that is comfortable and intuitive to users—then users can simply respond without over-analyzing the interaction. When prompting captures the style and structure of human-to-human spoken language, then users fall back on over-learned and largely unconscious rules about spoken conversation. Otherwise, we leave them in the very vulnerable position of trying to figure out the limits of the technology rather than just trying to accomplish their goals.
Expectations of Organizations
The expectations of organizations deploying speech can also be a factor that determines the ultimate success of these projects. Too often the owners of speech projects are woefully misinformed about what speech technology can do and when to use it. This is frequently demonstrated by overzealous enthusiasm for open-ended “how may I help you?” type prompting. Yes, so-called “natural language” can be done successfully, but it is not the answer to every problem and it requires a significant investment to deploy it right. The result can be misguided speech projects that users avoid and that also fail the organizations deploying them.
These comments may seem negative coming from someone who makes their living peddling speech to organizations, but this is in my best interest as well. Speech done poorly is bad for users, bad for organizations and bad for us speech folks as well! Smart speech systems plan for the inevitable failures they will sometimes encounter. By anticipating that not every interaction will be successfully completed we can design systems that retain and elegantly pass on information to live agents who can help users complete their tasks.
What We Expect
Finally, we need to look at our expectations in the speech community. We sometimes expect speech to be such an awesome experience that users will automatically want to use speech all the time. Not necessarily so! There are more and more users who move freely between Web and telephone interactions, choosing based on their physical location or other demands of the task rather than because of the technology itself. Users like speech when it’s of value to them—think eyes-busy, hands-busy situations like driving a car or walking from the train station. Those of us who design and build speech applications need to be cognizant that we’re building only one of a suite of tools that users have for communicating with an organization.
Setting the Bar
We are by no means doomed by expectations. By understanding the expectations of users and organizations, we can create speech systems that serve us all well.
- Do your best to understand users’ expectations before designing the VUI.
- Help organizations understand how speech technology really works, what’s required to make it work right, and when to use it and when not to.
Susan L. Hura, PhD
is the founder of SpeechUsability, a consultancy focusing on improving the user experience by incorporating user-centered design practices in speech technology projects. Susan founded the usability program at Intervoice as their Head of User Experience, and is a member of the Board of Directors of AVIOS.