Moving the Conversation Forward

^{At SpeechTEK West in San Francisco last year, a group of voice user interface designers participated in a workshop directed by James Larson of Larson Technical Services and Lizanne Kaiser of Genesys Communications Laboratories. Participants developed recommendations for dealing with error messages in speech applications from four perspectives. This is the third in a series of four articles summarizing the recommendations reached by these experts. Contributors to this article are:
• Karen Kaushansky, Tellme Networks,
a Microsoft subsidiary;
• Juan Gilbert, Auburn University;
• Susan Hura, SpeechUsability; and
• Silke Witt-Ehsani, TuVox}

When you have suboptimal error handling in a voice user interface, the response of the system can cause a downward spiral and create the appearance to users that they are in a bottomless pit. Error prompts can magnify this situation; instead of moving the conversation forward and closer to completing the user’s intended task, they can make the user feel stuck. One example is when an error prompt generates user responses that are not recognized. Reprompting with the same wording only encourages users to rephrase their responses and continue to be misrecognized, doing little to address the cause of the misrecognition and help the user get back on track. This also occurs with turn-taking errors and false starts, both of which can build error upon error, resulting in the user ending up in this downward spiral.

Why is the bottomless pit a problem? First, it diverts users further away from their tasks, getting them stuck in one small step of a longer process. Second, it damages users’ confidence and willingness to use the system, increases frustration levels, and potentially hampers the rest of their interactions with the system.

So how do you move the conversation forward, or at least give the user the perception of progress? Analyzing human-to-human conversations shows that with such conversations, errors are just as likely as they are with a speech recognition application. The big difference is that humans have developed a number of strategies to repair errors and move their conversations forward.

Based on patterns that can be observed in human-to-human conversations, we present several strategies to give speech applications the same capability to get past errors and to successfully continue a call.

1. Give the user visibility into what is going on.
For critical pieces of information that are needed, give users visibility into where they are in the process and how to achieve their intended goals. For example, instead of asking for the date, inform the user of why it’s needed: Before I can make the reservation, I first need to get the date.

2. Ignore errors in noncritical parts of the dialogue.
For noncritical pieces of information, or information that could be collected later in the call, consider ignoring the error state. First, be sure that every piece of information you ask of the user is actually needed. Next, if asking a noncritical question, on a no-match or timeout there more than likely is no need to reprompt. Instead, take the opportunity to move on:
For more information, go to our Web site at www.blahblah.com. Would you like me to repeat that?
[Silence].
OK. Next, to complete your transaction…

Even if the information is critical, if a no-match occurs, consider reprompting later in the call to collect the information, especially if there is a chance the caller will need to talk to an agent or the information being requested is needed only for certain paths within the application. An example of this would be a pizza-ordering application. If a no-match occurs when collecting a coupon code, consider moving on and collecting it later or with an agent. This will help move the conversation forward and build confidence on the part of the user.

System: What’s the coupon code?
Customer: 9-7-g-4-g-5.
System: Hmm, I didn’t quite make out that coupon code, but let’s get your order first and then we’ll be sure to get that coupon from you.

The other point to make here is that it is important to build user confidence at the beginning of the call. Choosing more recognition-safe interactions up front or moving on if no matches occur at the top of the call will build trust between user and application while forward progress is being made during the call. Once the user has invested a certain amount of time and effort into a call, he is more likely to attempt to correct an error later in the call knowing how far he has gotten.

3. Use smart reprompting for partial information.
In typical human discourse, if someone only understands half of a sentence, he phrases a question asking for the missing pieces of information while also telling the other person which parts he did understand. The same principle can be used in speech applications.

Using some conversational markers and dialogue, reprompt only for missing slots when asking for multiple slots in an utterance. For example, when asking for the month and day, reprompt with Great, what day in February? instead of I’m sorry, please repeat the day. Or when asking for city and state, if the state is recognized, then use that information to move forward:
System: Please say a city and state.
User: Fresno, California.
System: What city in California?

Along the same lines, there may be ways to reprompt users smartly when collecting digit strings or alphanumerics so that the entire utterance does not need to be repeated.
System: Please say your account number.
User: 6-4-3-2-4-5-3-2.
System: What were the last two digits of that account number?

4. Turn barge-in off where possible.
Another strategy is to ensure that a false barge-in does not happen at the beginning of a call. By not allowing barge-in for some portion of the initial prompt, or detecting a false barge-in and reprompting the user with the initial prompt instead of the no-match prompt, errors are avoided and the conversation progresses without the threat of the bottomless pit. Here is an example:
System: Welcome to XYZ Corp. What is your account number? (This prompt receives a false barge-in because the caller coughed before the prompt began).
User: ?*%& (User cough with very low confidence score).
System: I’m sorry, I didn’t get that. Please repeat your account number.

This becomes the first prompt the caller hears instead of the intended prompt because of the false barge-in. Had barge-in been turned off, the caller would only have heard the no-input prompt.

5. Use special no-match handling for noise and side speech.
Another alternative is to have separate error handling for very-low-confidence no-matches that are typically caused by coughs, noise, and side conversations. The strategy for handling those errors would be to not play any error message at all but simply repeat the prompt. So in the above example, the interaction would become:
System: Welcome to XYZ Corp. What is your account number? (This prompt receives a false barge-in because the caller coughed before the prompt began.)
User: ?*%& (User cough with very low confidence score).
System: Sorry, what is your account number?

This approach has the purpose of not stopping the call due to errors that are created by a suboptimally performing recognition engine.

6. Use fallback strategies.
If a caller is having problems providing a particular piece of information, don’t keep trying to get that piece of information multiple times. Rather, try collecting an alternative piece of information. For example, if a caller has problems with an account number, it is often possible to ask for a Social Security number or last name instead.
In summary, when designing a speech application, building in a methodology for error handling is a crucial component in creating successful applications.

Editor’s Note: In the next and final article in this series, to be presented in the June issue, another team of VUI designers will discuss the ways in which a larger taxonomy of errors can inspire change.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Moving the Conversation Forward

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Soul App Launches Full-Duplex Voice Model

Mistral Unveils Voxtral Open-Source AI Voice Model

Leena AI Launches Agentic AI Colleagues