August 30, 1997
By Michael Harris
Features

Put Your Best Voice Forward

Organizations everywhere are turning up the volume on speech recognition applications. Charles Schwab and Co. recently announced a new application that will recognize the spoken names of more than 13,000 stocks, mutual funds, or market indicators to provide callers with up-to-the-minute quotes. Carnegie-Mellon University is developing an application that will make Pittsburgh police officers safer drivers by enabling them to speak vehicle registration numbers into a national vehicle tracing system while they're behind the wheel.

According to a recent Ovum report on business opportunities in computing and telephony, revenues for systems providers in the voice processing market worldwide will grow from $2.1 billion in 1996 to $5.8 billion in 2001. Telcos are expected to be the major beneficiaries of developments in this market. They are predicted to generate revenues of $20.1 billion by 2001 if they provide their own subscriber services, including voice activated dialing, voice mail, and automated directory services.

One of the key contributors to this growth is the development of usable speech recognition technology. As it becomes an increasingly viable form of communication, forward-thinking businesses are incorporating it into more and more applications. But like any technology still in its infancy, speech recognition applications pose many hurdles, including designing the most effective user interface, or personality. How can businesses anxious to adopt speech recognition tell if their applications exhibit the right personality?

The personality you choose for your speech recognition applications is part of the public face you present. Like your company logo, product packaging, and even the way your receptionists answer the phone, your speech recognition systems contribute to the public perception of your company. Because of that, it is critical to apply a marketing discipline to their design. Segment the application's users to help you create the most appropriate personality. Use any available demographic and psychographic data to understand what type of callers are using the service and factor in how the service will be used.

For example, cellular service providers with a customer base composed predominantly of business users already know two important factors about their customers: they are psychologically in "business mode" during every transaction, and they are paying for air time by the minute. That means they want a personality that provides brief, businesslike prompts which enable the call to be completed as quickly as possible. Contrast that with a call center application serving a much broader range of callers, many of whom are inexperienced users easily intimidated or even offended by a terse, businesslike personality.

A good place to start is to recognize that the spread of speech recognition creates a new and much higher set of expectations on the part of callers. Conventional IVR systems ("Press 1 to place an order, press 0 to speak to an operator...") are very mechanical, both in flow of control and interface. But when the voice on the other end of the telephone can recognize speech, callers expect it to behave like the only other entity who can do that - a human being. Because of that, they want a natural, non-stilted conversational flow that doesn't rely heavily on instructional prompts. In other words, callers expect someone they would enjoy talking to.

That is a highly subjective concept. How do you choose a personality that appeals equally to every caller? Speech recognition systems can run the full gamut, from offering a friendly, chatty personality to one that is matter-of-fact and businesslike. Unfortunately, there's no easy answer. Interface design for speech recognition is still a very young discipline, and it encompasses many complex elements.

The Well-Designed Interface

In addition to applying marketing segmentation strategies, creating the right speech recognition personality involves adhering to sound user-interface design principles. The well-designed application is intuitively easy to use, coaches the user when needed, and can adapt to varying levels of expertise.

To promote ease of use, it's important to encourage users to employ a limited vocabulary, guiding them to provide an answer the system will understand without reverting to explicit prompts. Avoid the temptation to recreate a conventional touch-tone IVR application, with voice prompts such as "Say 0 to speak to an operator. Say 1 to place an order." A more natural prompt would be "Would you like to place an order or speak to the operator?"

Build in effective error handling. As with any new technology, callers are likely to become easily frustrated when recognition errors occur and are more prone to terminate the call. Error handling is the most important factor in a well-designed speech recognition application, but it is also the hardest to master.

A major contributor to successful error handling is your user help system. It can come in two forms: in-line and coaching. In-line systems are accessed via an intuitively obvious phrase such as "help" and provide the caller with a clear list of instructions on how to use the system. Coaching the user is achieved by providing prompts that help guide the user toward the right answer. A good example of this is a prompt that says, "I'm sorry. I didn't understand what you said. Did you say "place an order"?" If the caller answers no, the system can branch to the next most likely response or go to an explicit help prompt.

No matter how you choose to design error handling, avoid trapping the caller in a nightmarish loop of "I didn't understand you. Please repeat your answer" prompts, which is guaranteed to end in a terminated call and create a negative impression of your organization.

Sophisticated speech recognition applications can actually take into account user expertise levels. Vicorp's voice activated dialing is a good example. Unlike a call center application, voice activated dialing is used by a single subscriber, and employs deductive learning to "know" how familiar the subscriber is with its use. Vicorp's implementation has been designed to offer the user increasingly brief prompts as less and less speech recognition errors occur. However, if the application perceives that the subscriber's error level has actually increased, it is able to step backwards to more expansive, user-friendly prompts until the subscriber becomes an expert again.

The Best Candidates

Designing the right personality for your applications is probably the toughest challenge you'll face, but it isn't the only one. Knowing when not to enable an application with speech recognition is also a challenge. Theoretically, any application that requires interactive control is a candidate. Common examples include network control applications, such as enabling a user to provide speech input to forward their phone calls; call center applications, such as systems that provide airline departure and arrival information; and enhanced services, such as voice activated dialing.

However, not every application is ideally suited for speech recognition. A major factor to consider is the cost/service tradeoff, in which the cost of both user interface design and new hardware should be weighed against the convenience, safety, and security benefits to the user and the revenue-generating potential for the service provider. According to industry estimates, a speech recognition-enabled interface can cost up to twice as much as touch-tone menu selection technology.

Again, you must look back to the demographic data and needs of your target market. For example, a speech recognition interface can be used to pamper your high value customers, while lower volume customers may be most cost effectively served by a touch-tone interface.

Another alternative to speech recognition is the Internet, which is well-suited to applications that benefit from the visual presentation of information. For example, end-users can use the Web's visual interface to manage call routing and call forwarding over the Web, potentially complex telephony features. This visual approach is much easier for the subscriber to comprehend than a corresponding telephony interface using touch-tone or voice recognition.

The bottom line is that speech recognition is not a panacea - adding it to an application depends on the user requirements and the business case you can build to meet them. Once you have made that case, designing the perfect personality for those applications is no easy task. You have to know your target audience and create an interface that they will find intuitive and appealing to use. Get it wrong and you run the risk of alienating your user base. But with the right personality, your applications can really go places.

Michael Harris is director of product marketing at Vicorp, a Precision Systems company and leading provider of software-based products for global wire, wireless, and private network providers. For more information, call 813.572.9300, extension 3209.

Put Your Best Voice Forward

Vonage Integrates with Salesforce's Agentforce Voice

Lorikeet Launches Voice 2.0

Deepgram Launches Streaming Speech, Text, and Voice Agents on Amazon SageMaker AI, Integrates with Amazon Connect

Krisp Launches SDK for AI Accent Conversion