Still on the Fence?

If you’re still “on the fence” regarding a speech upgrade to your voice response system, here are a few tips for taking the speech recognition plunge, along with examples of what speech will and won’t do for you.


Before tackling this article, I thought it was critical to answer this question. And nothing beats good old-fashioned primary research to get some answers. As one of the founders of independent research firm Sterling Audits, I was able to muster the help of twenty or so of my researchers to make some phone calls, 271 phone calls to be exact. We scoured the nation’s voice response systems at random to see how many of them use speech. The answer: 18.8 percent of the systems we called used some form of automatic speech recognition.

This is far from pervasive in my view. But I can tell you that many of the systems we called could benefit from speech recognition. So how do you make the decision to go to speech?


Before we dive into a more complex analysis, there are a handful of instances where speech is a “natural.” For Example:

  1. Complex Alphanumeric Multi-Character Input. Simply put, if your callers have to use the touch-tone pad to spell names, spell stock ticker symbols or enter account numbers with both letters and numbers, you need speech recognition. Doing these things with touch-tones is tedious, prone to failures and frustrating for callers.
  2. City Pairs and Addresses. Transportation, entertainment and travel-related businesses are sure candidates for speech recognition. Callers find it easier to make reservations by speaking the name of a city versus typing in airport codes or spelling cities on the keypad, which is pretty rough going.
  3. Large Number of Products or Services. We’ll touch on this in more detail later, but the idea here deals with the number of dialog turns or possible termini (possible terminals or end-points in a transaction). If you’ve got hundreds of products you offer support on over the phone, speech recognition makes it easier to navigate.

If your application does not fall into one of the three “natural” categories, you still may wish to consider implementing speech recognition. An approach I have found useful in working with my clients has to do with the intersection of dialog turns and possible termini.

A dialog turn is a place in the voice response transaction where the machine hands over a turn to the caller and the caller responds with touch-tones or speech. To simplify the exercise, I usually characterize the dialog turns based on touch-tone input only. Since this is the vantage point from which you will be coming (still considering speech, but using only touch-tones for now) you should count up the dialog turns in every nook and cranny of your application.

The number of dialog turns differs greatly from application to application and not surprisingly from industry to industry. Take, for example, a simple order status application. You may be using as few as ten turns. If the order number is strictly numeric, this does not exactly scream out for speech. On the other hand, if you’re a utility company offering power outage, promise-to-pay, meter reading, account information, energy saver advice, etc. – you’ll have anywhere from 50 to 120 dialog turns in your voice response application.

The other part of the puzzle is the number of possible termini. These are end-points or points of completion in the voice response transaction. For example, if you call a retail banking application to get your account balance and type in your account number and PIN you may hang up after hearing the balance. The reading out of the balance is the terminus in this example. Or, if you call a technical support line to get help on a computer or printer, once you’ve identified yourself and the product in question, you may be routed to a customer service representative. Once you’ve reached that representative, that’s a terminal in the application.

When you intersect the touch-tone equivalent dialog turns and the number of possible termini, you are faced with the so-called Turns / Termini Complexity Matrix shown here. Although this is a simplification, you can use this methodology to figure out the need for speech pretty reliably (Of course, you have to take into account the “naturals” and the “un-naturals,” too).



First, calculate the number of touch-tone equivalent dialog turns in your application. Second, sum the possible number of termini in your application. Plot a point on the graph as shown in the example. (These are just examples, so don’t assume your application is the same as the samples I’ve used here). Simply put, if your plot point ends up in the lower left-hand corner of the graph, the need for speech is questionable. If you end up here, you need a “natural” or “un-natural” reason to implement speech. If your plot point ends up outside of the lower lefthand corner, your application is a good candidate for speech.



Your boss thinks it’s cool so he or she has approved the budget. Or a competitor has just launched speech on their voice response system and the VP of customer service has commanded you to implement it come hell or high water (and instantly). Or a speech industry executive plays golf with your company’s chairman. In these instances, you’ll be implementing speech. But it’s a good idea to set the correct expectations of what it will and will not do for you.



This is a big claim, so let me qualify it by saying: “What Speech will do for you if it’s implemented competently and with good planning.” Just plopping speech on top of a poorly designed touch-tone system can be disastrous. Here goes:

  1. Speech can make for a more natural dialog, so if you script the application well, callers may enjoy the experience. During some recent usability testing we conducted at Sterling Audits, subjects often reported that speech-based systems seemed more pleasant to deal with. But beware! What speech will not do is solve the problem of error recovery. If mistakes happen, it’s not the speech recognition that gets callers back on the right track – but rather a robust dialog design and proper error recovery mechanisms. Pre-canned error recovery routines don’t cut it. Get a professional to help you with the design.
  2. Speech helps to flatten or “tunnel” through what would have been longer and deeper touch-tone menus. This may have the effect of helping callers get to what they want. But what most speech vendors don’t tell you is the effect this has on “Power Users.” That is, folks who call in to systems very frequently can find speech tedious and slow compared to touch-tones. Such is the case with Providers using a Healthcare application. The majority of these folks have cheat sheets taped to their phones with touch-tone sequences mapped out. Bottom line: Power users use touch-tones. They can type a string of (overdial) digits in two seconds and get to what they want even quicker than using speech. Who cares? The power users do, so don’t throw out touch-tones when you upgrade to speech.
  3. Speech paves the way for persona development, which is kind of icing on the cake. But just because speech dialogs are more natural, doesn’t mean that speech equals persona. You can have a persona developed for a touch-tone only system, as a matter of fact. So if you want to develop a brand and a persona for your system, speech technology in its own right cannot do that for you. Yes, speech vendors  dabble in persona development, but the cost to do this is over and above the speech licenses. Don’t expect a persona to jump out of the box when you buy speech.

Edwin Margulies is co-founder of Sterling Audits, a firm dedicated to quality improvements in customer service automation and contact centers. The company specializes in benchmarking the usability of self-service systems. As EVP and Chief of Research, Margulies is responsible for projects including the 2004 Web Site Usability Almanac and the 2004 Voice Response Usability Almanac. He is also on the board of directors of AVIOS, where he participates as the chair of the Marketing Committee. He can be reached at 702-341-0314 or ed@sterlingaudits.com .

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues