May 1, 2013
By James A. Larson program co-chair, SpeechTEK 2021
Forward Thinking

Assessing Speech-Enabled Help Apps

When users try to install, diagnose, repair, adjust, or use their products and something goes wrong, where do they turn for answers? Depending on the product, they might rely on a product manual, a Web site, or even help embedded in the product (e.g., software) itself. But when it comes to getting help from solutions that use speech technologies, there are four options: interactive voice response (IVR) applications, mobile device help applications, multimodal applications for mobile devices, and intelligent agent software. Following is a look at all four.

Interactive Voice Response

Users can--and do--access IVR systems by means of any telephone or mobile device. For that to happen, developers of these applications must learn VoiceXML and other specialized languages. Special skills are required to design easy-to-use verbal user interfaces.

People calling into IVRs frequently complain about the time required to use the applications due to long verbal prompts and time-consuming navigation. Some companies require callers to remain in the IVR system and don't allow them to switch to a human agent. Some are extending their IVR systems to present information as pictures, graphics, and video in addition to synthesized speech and prerecorded audio.

IVR systems are not a long-term solution. They will remain in use until the majority of customers obtain mobile devices. Future development of voice-only IVR applications seems limited because they are being supplemented or replaced by the following sources of help.

Mobile Device Applications

Many companies develop and deploy their own customer support applications on Android, iPhone, or other mobile devices. These provide owners with the same information via text and graphics that IVR applications provide via voice. Mobile application developers find it difficult to author a single application that can be deployed on more than a single type of mobile device. Standard languages, best practices, and development tools that generate code portability across device types are needed to overcome this problem.

Multimodal Applications

Multimodal applications integrate speech and handwriting recognition directly into mobile applications. Most mobile applications only accept keyboard and touch input from owners. However, keyboard replacements support additional forms of input, such as Access' Graffiti, which accepts handwriting, and Nuance's Flex T9 and Swype, which accept voice and other user input. These keyboard replacements effectively turn mobile GUI applications into multimodal applications in which the user types, writes, finger gestures, or speaks to enter data. Eventually, handwriting and speech recognition will become part of the operating system and be available for all applications, including help.

The World Wide Web Consortium's Multimodal Interaction Working Group has published a standard architecture for multimodal applications which accepts input from many sources, including text, voice, video, touch, GPS, and device orientation. This approach has been adopted by Open Stream and Angel.com, but has yet to be embraced by other mobile application platforms.

Intelligent Agent Software

Intelligent agents, such as Siri, are a new class of applications that perform functions similar to human agents. Intelligent agents promise to support various types of natural language--via any of several techniques. They process requests using a variety of artificial intelligence techniques, including powerful parsing techniques, access to context and history, advanced search, statistical processing, and clever multimodal dialogue techniques. New agents appear almost daily on mobile devices, and will soon find their way into other types of devices, including desktops, laptops, home appliances, cars, and game platforms. Useful intelligent agents continue to be difficult to build due to the complexity of the user interface and access to multiple sources of information. Currently, most are sophisticated search engines that accept voice and present results using text and graphics, but will become multimodal in the future.

New approaches for providing owners with help are being deployed. The rush to GUI-based mobile applications will continue. Help systems will become multimodal, accepting requests for help using a variety of input techniques and presenting responses in the medium most appropriate for the owner. Intelligent agents will supply sophisticated help and assistance and provide easy-to-use and effective help information.

James A. Larson, Ph.D., is an independent speech consultant. He is co-program chair for SpeechTEK and teaches courses in speech user interfaces at Portland State University in Portland, Oregon. He can be reached at jim@larson-tech.com.

Assessing Speech-Enabled Help Apps

Interactive Voice Response

Mobile Device Applications

Multimodal Applications

Intelligent Agent Software

Speech Technology Must Address Users' Needs

Updated Nuance Swype Now Available on Google Play

VoltDelta Powers Nevada's New 511 System

[24]7 Releases Visual Speech

Eltropy Expands Voice Authentication Ecosystem with Illuma, IDgo, and Pindrop

Modulate Expands Velma with Voice-Native Real-Time Conversation Intelligence

Corti Launches Symphony for Speech-to-Text

Why Voice AI’s Next Big Challenge Isn’t Accuracy. It’s Relationship Design.