November 9, 2015
By Leonard Klie Editor, Speech Technology and CRM magazines
Features

The Art and Science of Error Handling

Is there a general rule as to how long a system should wait for a response after asking a question?

You need to provide an appropriate pause in between questions for the caller to jump in when needed. First, you need to give callers credit and assume that they know how to answer most questions. It's a bit of an art as much as a science. There is no magic number for the amount of seconds to wait, but if you use natural, innate dialogue rules, where you give a certain amount of time for someone to answer, and you take into account the peripheral things that the caller may need to get or look up before answering, you start to get something manageable. The key is to allow the caller time to respond and not to throw in additional instructions. Give the callers time to respond, and leave the additional instructions only for those callers who need it.

Can you build something into the script, text-to-speech engine, or voice recording that lets the caller know when it's his turn to answer?

Absolutely. In a phone conversation, we lose so much without the visual cues. All those things that we put into our voice and that we take for granted in human-to-human conversations are just as important if a speech application is going to mimic one side of a conversation. Make sure that the voice talent is coached to properly convey the meaning. You need to coach your voice talent to put intonation at just the right point to make it clear to callers just what exactly we need them to pay attention to. All those non-verbal intonations are just as important in conveying meaning and turn-taking cues as the words themselves.

How do you build that right into the programming?

Systems today can be programmed to do just about anything with the right logic, but the designs have gotten so modular in nature where there's not enough attention paid to what happens between each question. It was always an egregious sin in the industry to turn off barge-in and not allow callers to interrupt, but now you can judiciously disable barge-in at appropriate times, like when the system is doing a data dip. In places like that it's OK to disable barge-in.

Other things also get overlooked in the design, like transition states to let the caller know that we're done here and we're going on to the next step. Callers are on this linear journey with you, and if you let them know what just happened and what's coming next, it not only helps them to know whose turn it is in the conversation, it also lends a sense of movement to the conversation so they feel a sense of progression.

How could the system be programmed to recognize when it's having a problem and to know what to do about it?

It is something you have to design for ahead of time. If you already have an existing system, I would start by diagnosing it fully. There are a few things you can look for. In the tuning process, part of it is quantitative in nature and part of it is qualitative. Take your call transcripts and see how many utterances from callers were in grammar and how many were out of grammar. There are also a lot of other things outside of the speech that get transcribed but are often dismissed or overlooked because they're not part of the dialogue, such as noise, stutters, callers giving partial responses or repeating themselves, or false starts. These are all signs that callers are having problems, and they shouldn't be ignored.

Also look at the types of responses you're getting, particularly if you're looking for one type of response and you're getting a completely different type of response, like when you get a phone number in response to a yes/no question. Look at those and trace those back in the call logs to see what's happening. Diagnose your system; you can usually find pretty elegant ways to compensate for errors if you know where they are happening. And then build it into the system to make sure that same thing doesn’t happen again.

What tips would you give other designers for handling the typical turn-taking errors?

The first thing you need to do is to look at the bigger picture. These conversations are linear and sequential, and you have to anticipate what will come before and after any question you ask.

Then, look at your prompt structure. Don't use short pauses. Structure your questions so the caller is clear, so the caller knows when it is his turn to respond. Judiciously allow or disallow barge-in where appropriate. Allow enough time between options. Give the callers the mental time to cognitively process them and respond.

And lastly, get back to the basic roots of any conversation. Remember that the machine is trying to mimic human-to-human conversation, and the same basic rules have to apply.

Senior News Editor Leonard Klie can be reached at lklie@infotoday.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Art and Science of Error Handling

SoundHound Partners with Allina Health

Krisp Unveils AI Accent Conversion for Latin America

German Study Validates Life-Changing Effects of Assistive Technologies

Firstsource and Sanas Partner to Redefine Customer Conversations with AI