SpeechTechMag.com: But Is It Natural?

It’s raining. You’ve got a vicious deadline clawing at your back, and it has been a long day of intermittent Internet access. Your service cuts in and out in the middle of every email you try to send. The lights on your wireless router flicker impotently, and the signal from your one neighbor with a wireless network that doesn’t require a password is so weak that you can’t get in. You get just enough of a signal to see the network name, BiLLsFAN77, sitting there, taunting you. Your rage is building. You tear the handset off the cradle and start calling your Internet service provider.

Hello, and welcome to the XYZ Corporation’s DSL Help Line, a cheery woman, to whom you are ready to read the riot act, greets you.

Your call is very important to us, she bubbles on. In order to be properly serviced, please choose from the following menu.

Oh, brother, a menu.

If you’d like to order service, press 1. If you’d like to upgrade your current service, press 2. If you’d like to check your bill, press 3. If you’d like to…

Your ISP’s system is a nightmare. It’s a winding, hierarchical maze of one touch-tone input after another, and whichever option you need seems to be at the tail end of every menu you get kicked into. It takes 20 minutes to reach a live operator, and all the while you’re thinking, “There’s got to be a better way to navigate this thing.”

There is: open-ended prompts.

An open-ended prompt is perhaps easiest to define by what it isn’t. The open-ended prompt is very different from the directed dialogue or closed-ended prompt with which many users are most familiar. When users call into an open-ended interface, rather than being provided with a list of several options that they then need to repeat to the computer, open-ended prompts ask more broad-based questions, like How may I help you? and let users respond in their own words, getting as detailed as they care to.

Open-ended prompts do this by capitalizing on statistically based natural language (NL) technology. The system draws from a bank of thousands (or even tens of thousands) of recorded utterances. The utterances are all tagged, usually manually, and then linked to corresponding routes within the domain. Based on the similarity of a caller’s utterances to those results, the system is then able to figure out what a user wants; sort of like “the survey says” from The Family Feud, it is based on aggregated responses.

To get to the point of being able to process a call correctly, though, the system has to have thousands of previous utterances that have been sorted and tagged. To implement, many designers insist that tens of thousands of utterances have to be captured and tagged to get the corresponding correct call steering percentages somewhere in the high 80s to 90 range.

This, as one might expect, entails a labor-intensive and expensive development process. Justifying these costs is one of the biggest challenges facing designers and developers who want to implement an open-ended approach. Many designers are very aware of this. Almost every designer will tell you that she’s agnostic about the technology and wouldn’t advocate it for a number of domains, but many will also argue that very practical applications can make it a good solution for others.

“The advantage, I think, is to the end user,” says Jim Larson, a consultant, VoiceXML trainer, and co-chair of the World Wide Web Consortium’s Voice Browser Working Group. “The end user no longer has to listen to a series of menus asking questions that you can only respond to by a small number of choices. That’s sort of like playing 20 Questions until the system can guess who you want to speak with or what you want to do. Those menu hierarchies are very time-consuming, they’re tedious, and users just hate them.”

Phillip Hunter, vice president of the Voice Interaction Group at SpeechCycle, asserts that callers shouldn’t be exposed to a hierarchy of more than five categories. An open-ended prompt can cut that exposure and do so in a way that’s far more conversational than traditional modes. In fact, it may be one of the only effective ways to cut down on that while still retaining a single dial-in number.

“A Web site has multiple entry points,” says Juan Gilbert, associate professor of computer science and software engineering at Auburn University. “A Web site can have 100 pages, and you can go directly to any given page. But in a VUI, the entry point is always the same. You can’t call a VUI and enter on page seven.”

For this reason especially, open-ended prompts make a compelling case. They don’t require a long list of choices at all.

It should be noted, however, that open-ended prompts do not mean an open-ended system. Usually, only the first prompt is open-ended. From that initial response a user is generally channeled right to a traditional directed dialogue, which gathers information and does the rest of the routing work. The open-ended prompt should be thought of as really just an opening salvo.

Once you’ve committed to implementing an open-ended prompt, though, plenty of devils in the nitty-gritty details of the design need to be sorted out. One of the first is how to develop the initial prompt. Open-ended VUI design is very different from the grammars-based approach. The wording has to be dictated by what is statistically borne out as effective by the data. You can’t start gathering data, though, until you have the prompt, and it’s just too expensive to try and collect data for several different prompts.

In this sense, it’s a bit like the chicken-and-the-egg proposition. This is why, especially for an open-ended prompt, it’s crucial to have an experienced designer—one who understands what she is doing, can draw on that experience to avoid known mistakes, and has an intuitive sense of how to begin.

“Five years ago there weren’t very many people who had that sort of experience, and there were a lot of things we hadn’t learned about open-ended prompts,” Hunter says. “Now with the amount of experience that has been gained about them, there’s really no reason to not be able to find someone who’s had experience and, at least, get some good solid pointers, if not actually have them do it for you.”

Another challenge facing open-ended prompts is many users’ sheer unfamiliarity with NL systems. When asked, How may I help you? they often seize up. They’re used to command-word interactions where they stiffly and robotically respond with utterances like Yes, No, and Wednesday. Left with the option of saying whatever they want and however they want to say it, users are often skeptical about whether the computer on the other end will actually understand them or they’ll end up in a domain into which they had no intention of reaching.

Susan Hura, vice president of user experience at Product Support Solutions and founder of SpeechUsability, describes this seizing up as the “deer in the headlights” effect.

“Most people have experienced a speech system at this point,” she says, “but very few people have dealt with an open-ended natural language prompt up front. When you’re in the process of doing these designs, you’re always fighting against what’s going on in the user’s mind. Are we freaking them out? Do they know what to say? Very often, what you’ll hear as you monitor calls…into these applications [is] the open-ended prompt, and then you’ll get this long silence.”

Hura herself pauses after she says this and then laughs.

VUI designers have long overcome this obstacle by providing an example after an open-ended prompt. If, for instance, the opening prompt was How may I help you?, it could be followed by You can say something like, “I’d like to check my account balance.”

The example gives users something from which to pattern their responses. If the example given is, itself, a high-frequency response—that is, a leading reason that a given contact center is getting calls in the first place—the example often gives users an exact phrase for that reason and takes care of a lot of the potential responses right off the bat.

Many designers will say that it’s also important to give only one example. They have found that users are so used to directed dialogues that when given more than one example, they perceive it as a list. They may end up repeating things from the list that they think are closest to what they want, and then wind up channeled to the wrong operator or menu. To illustrate, a user might want to check her balance, but if that isn’t one of the examples and Pay my bill is, she may end up selecting that one because it seems to most closely approximate what she’s looking for.

Just Back Off

At SpeechCycle, Hura’s deer-in-theheadlights phenomenon is referred to more technically as “performance anxiety.” In addition to some of the more common workarounds, SpeechCycle is a proponent of offering users a back-off menu for any open prompt. Back-off menus are an additional list of possible usable phrases. Users are given the option of accessing them by saying something like What are some choices? They are then routed to a list of possible responses, which is much like what one would find in a directed dialogue. This lends users who are particularly uncomfortable with the openness of an open-ended prompt a kind of crutch; unlike in a traditional directed dialogue, the only users who have to sit through a list of choices are those who elect to do so.

According to Hunter, a lot of users end up choosing this feature. “One thing we also found is that when they hear the list, it renews their confidence in the system, and some callers will actually use an open [statistical language model]-type utterance,” he says.

As a result, it’s important to design the back-off menu open-ended like the first prompt, Hunter adds. Users can become confused if they are allowed to say whatever they want to start with and are suddenly limited without warning when asked for more options.

“Traditionally, the speech world has sort of slapped those callers in the face and given them a no-match,” he says, “which is really horrible because they asked them an open-ended question and the caller cooperated…and then they get the [back-off menu] and say something like, I want to check my balance, and the speech system says, I’m sorry. I didn’t understand you. It’s like saying, ‘First I can take it and now I can’t.’”

The open-ended prompt also presents problems in terms of its flexibility to change. As Larson points out, adding or removing a category requires new statistical data and a revision of tags.

“These things are difficult and time-consuming to create and are difficult to modify,” he says, noting that it’s especially true if you’re looking to add categories.

In this case, new responses will have to be captured so the system can recognize them and route properly. This not only costs money but takes time as a system trains and adjusts to fluctuations. It may be days or weeks before a new category is properly calibrated and calls are routed with fully optimized accuracy. It’s not, however, like going back to square one.

Often, when a prompt is changed, it’s not even the question itself that is altered, but the example that follows it. Sometimes, this example is changed to reflect shifting data. If a contact center, for instance, sees a certain kind of call overtaking the original example in iterations, it might be a good idea to swap it out.

This, however, can pose some challenges and should not be taken lightly.

“Changing that example, you will see a multiple-percentage-point increase in the use of the new example and a multiple percentage decrease of the old example,” Hunter says. “When you remove it, now people are still calling for that reason, but they don’t have the model as to how to say it, and they’ll say it all sorts of ways. Those ways were probably not accounted for, and are probably not in your data. Now they have to be, and you’re going to suffer some performance decrease.”

Even if you don’t make any changes to the prompt or the available categories, maintaining the system’s accuracy requires constant data collection for occasional grammar updates.

Roberto Pieraccini, chief technology officer at SpeechCycle, points out that as marketing campaigns roll out, new language appears and demands a grammar refresh to route accurately.

“We don’t hear about those changes until after they’ve already happened,” Hunter adds, “so with our continual focus on collecting data, week after week, we are able to go back in time and say, ‘We just found out about this change, and here’s where it occurred, three weeks ago.’ That allows us to respond to our customers very quickly. And that’s much more powerful than snapshot data collection where you turn your tuning grammars on for, say, a week and transcribe it.”

Clearly, building an open-ended prompt is no cakewalk. It requires constant attention. In essence, a system is never done being built. That said, open-ended prompts offer a potential increase in customer satisfaction through shorter wait times and a less lugubrious interface—and from the domain’s perspective, at least ostensibly, less of a tax on resources through faster call processing and handling times.

It should be said, though, that even the world’s best NL system, or even the world’s best-designed VUI at large, can sink like a passenger ship slamming into an iceberg if it kicks the user to a representative who can’t help him or lines him up in a queue where he must wait endlessly for help. After all, when you’re gripping the phone, forehead veins throbbing with hot blood, do you really care how quickly your call was routed to the on-hold Muzak-version soprano sax solo that has droned on for the past hour or more?

A Matter of Priorities

That’s precisely what Melanie Polkosky, a human factors psychologist and senior consultant at IBM, points to. “If someone’s on hold for 20 minutes, you’ve not reinforced him for correct usage,” she says. “Until corporations really start to prioritize the full customer experience—and I’m not just talking about IVR in a box, and let’s just change our IVR and expect everything to be better—until this whole thing is seen as a total customer experience, what we’re able to do just within an IVR is going to be fairly limited.”

In other words, if a system properly routes someone, but he’s still not rewarded with the service he expects, he’s not going to trust that system.

Bad service has repercussions well beyond a given domain’s IVR, too. When a caller enters a system, she enters with the sum total of her entire IVR experience. Each system is really training against every other system, because “there are far more bad systems out there, still, than good ones,” Polkosky asserts.

That is to say, a poor IVR experience in one domain could potentially poison all domains with which a given caller interacts in the future. For this reason, Polkosky worries not so much that users are not learning how to use IVRs, but that they are learning from the poorly designed systems that predominate the field.

Even if we begin this move toward a more data-driven approach, it could prove slow to implement. The technology and data gathering is still relatively expensive and, for smaller enterprises especially, the costs might not be sustainable. This means that many systems out there are probably going to continue working as they have for a long time to come.

To get the most out of a system, Polkosky says a domain has to pay as much attention to the live operator receiving calls as it does to the IVR, lest the company risks investing in an IVR for nothing. Above all else, she thinks that the entire user experience—from the beginning of the call all the way to the end—is more important than narrowly focusing on the technology alone.

“For the most part, there needs to be a top-down view, where people are [asking], ‘What’s the IVR experience?’ and ‘What are all the other things in that total customer experience?’” Polkosky continues. “I don’t think the opportunity to ask an open-ended question instead of a closed-set question makes or breaks a customer interaction....You can have good and bad design regardless of the technology.”