A Helping Hand in Voice Search
The Applied Voice Input/Output Society (AVIOS) is organizing the Voice Search Conference (www.voicesearchconference.com) in March, signaling its belief in the importance of this new trend. What is voice search, and why does AVIOS feel it is significant enough to warrant a new conference?
Voice search includes important applications, such as directory assistance automating voice access to databases. Searching audio files using text inquiries, where the content of the audio files is deduced using speech recognition, could also be viewed as a form of voice search. But voice search is much more than the sum of its applications.
Voice search makes finding things easier, like a helpful assistant. In its most direct implementation, it allows a just-say-what-you-want model of user interaction.
Voice search isn’t just a voice user interface (VUI)—it may be integrated with a graphical user interface (GUI). For example, a multimodal application could deliver text in response to a voice request. The point could even be to deliver speech as searchable text, as in voicemail-to-text services.
Further, on mobile phones, there are times when it isn’t appropriate to speak, such as in meetings or when privacy is a concern. A GUI or text-based alternative provides a way of accomplishing the task, although perhaps not as efficiently as a speech interface. The GUI need not require a separate learning curve; it can be driven by the voice search paradigm—just type what you would say.
Live agents can provide another extension of the voice search paradigm. Having "hidden agents" assist speech recognition behind the scenes, for example, is an effective—and increasingly used—approach. The result is increased confidence in automated services since the user is often not aware that an agent was involved. Furthermore, data collected during agent-assisted services can be used to tune the application and increase automation rates.
So why not just use a GUI? GUIs were highly intuitive and effective when applications had fewer features. We had less data with which to deal, and Web sites were fewer and simpler. But today’s complexity strains the paradigm and often leads to deep and difficult-to-navigate hierarchies. Search boxes are a partial answer to this increased layering, but also often require time-consuming hierarchical navigation as one wades through a long list of alternatives to find truly relevant information.
The power of GUIs is that we understand them based on experience. We can usually find what we want eventually without recourse to a manual. But GUIs don’t fully address today’s complexities, particularly when used on small mobile devices. A say-what-you-want speech interface, supported by quick speech dialogue when clarification is required, is highly effective for dealing with those complexities.
So why hasn’t voice search taken over mobile devices and even PCs? Speech technology has only recently reached the point where it has the ability and credibility to support an unstructured interaction.
It’s annoying when one goes to a company Web site and doesn’t find a search box specific to that site; we’ve grown accustomed to the convenience of it. The same will be expected of call centers. Once customers realize they can just say what they want and get answers quickly, they will expect that when they call companies. Natural-language call routing or information retrieval is a proven way of supporting this expectation.
Voice search summarizes a general capability in interacting with users. A full voice search philosophy of user interface design can include:
• users assuming they can just say what they want;
• a single point of entry for many applications and information sources;
• the use of quick speech dialogue to resolve ambiguities in the request or to narrow the possible options;
• avoiding overly structured hierarchical navigation schemes;
• an option that allows typing what you would say when speech isn’t usable on a device with a voice search interface;
• agent backup (or at least helpful prompts) when the user is not succeeding in a request; and
• the accessibility of speech as text.
Users might view this set of features as voice search. If a summary name helps form an accurate mental model of what one can do using speech, then that model can be leveraged more easily in both services and enterprises.
William Meisel, Ph.D., is president of TMA Associates, an independent speech technology consulting firm, and editor and publisher of Speech Strategy News. He can be reached at firstname.lastname@example.org.