Usability Scorecard

This is the second in a series of “interactive columns” where the STM readership can participate in auditing voice response systems using a Web-based research portal. In this issue, we explore sample data based on research conducted by the readership of STM and the Sterling Audits staff since the last issue. Next time, we will highlight a company who scores high in the Sterling Audits Usability Index.

40,000 Answers to Survey Questions So Far

First, my thanks to all of you who’ve participated in the research initiative which is the engine of content for this column. Over the past two months, we have garnered the answers to over 40,000 individual questions which are part of the Sterling Audits Usability Index and associated Web-based research portal. STM readers have made a significant contribution to this effort. Many of you heard about the research portal based on the open call for researchers, but a large number of our researchers have joined owing to professional curiosity. (It sure can’t be the paltry $10 stipend we pay for each completed survey).

As a quick review, the Sterling Audits Usability Index is the industry’s first open and non-proprietary methodology for scoring and weighting the usability of self-service systems. Here, we concentrate on voice response, although the survey instrument covers web site usability as well. The critical success areas we focus on are: 1) Navigation; 2) Content; 3) Usability; 4) Interactivity; and 5) Credibility.

Each of these critical success areas represent dozens of usability-oriented questions. Each of the five areas is weighted to be equal to 20 percent of a perfect score of 100 percent. No system so far has scored 100 percent, but that’s not really the point. The point is to use the Index to benchmark your voice response system against your peers’ systems. This data can be used to make usability improvements which, in turn, can give you some uplift in efficiency and customer satisfaction.

In this issue, I have taken one example from each of the critical success areas to share with you. This reveals the level of granularity and depth of the survey. I show here only five questions out of over 200, so this is by no means comprehensive. Therefore, the purpose of this issue’s column is to inspire you as a reader to get involved in the research initiative.


The Navigation questions in the Sterling Audits Usability Index deal with the way callers are guided through the call logic flow of a voice response system. You “Navigate” based on input modalities such as Navigation Keys (touch-tone) or Navigation Words (speech commands) or a combination of the two. The scripting of the application is the words and sentences that are spoken to you by the machine. There are many factors that go in to a quality experience navigation wise. These include, to name a few: The usefulness of persona in navigation; menu structure, length and related short-term memory issues; the use of automatic speech recognition; and multimodal input (touch tones and speech).

There are also Navigation questions dealing with operator access, the use of standard navigation keys and navigation words; seasonal prompts and special alerts; default treatment logic; and over-dialing or barging-in over system prompts. As with all of the survey questions, a score is tabulated based on how these items are used and most importantly how they affect the caller.

The example Navigation question we have chosen deals with callers’ short-term memories. Specifically, whether or not callers had to hang up and call back to get their work done or whether the system provided a crisp, easy-to-remember list of choices. It’s telling that only 8.4 percent of respondents reported that they were “…able to remember all of my choices easily” (a top score of 10 on a 1-to-10 scale). In another recent survey conducted by Sterling Audits, we discovered that 18.8 percent of voice response systems use speech. This was based on 270 randomly selected voice response systems. This means speech is no panacea for human memory challenges.


Content deals with the informational make-up of a voice response system. This deals with the depth and breadth of a system’s capabilities to automate otherwise agent-guided tasks. So in this context, the content of a voice response system defines its robustness. It all comes down to a measure of how useful the options are and if the information you are looking for is accessible. For example, a retail bank’s voice response system with no account balance function would be regarded as being “light” on content.

Other topics covered in the survey’s content questions include the use of certain words and their meaning; asking for instructions and help; and the use of extraneous messages. I chose a simple question for the example here which deals with how useful the selfservice experience was content-wise to the callers. On a scale of 1-to-10 where one is “I couldn’t really do much with the system at all” and 10 is: “The system allowed me to complete all the transactions I needed without an operator” – we got some pretty interesting results. In short, about 25 percent of callers scored systems in the range of “five” or below. This means that these callers did not get everything done that they needed and had to speak to an operator to do so (or only got partially what they needed). Of course, there are business rules that dictate how much can be done automatically. There are practical reasons, too, why you might not want to automate everything. But what’s interesting is the callers’ perspectives. The question is posed in such a way that respondents are answering based on their perceptions of usefulness – not the perception of the enterprise. I’m hypothesizing that far more functions can be automated than are currently being tried. I also hypothesize that callers may be more tolerant of automation than we thought when we embarked on this research initiative.


Usability is a measure of how easy the system is to use. Of course, what weighs heavily is the human side of the equation. That is, how people judge the ease with which they are able to get useful work done. Users will judge the overall usability of a system based on their perception of its efficiency and their overall satisfaction with the experience. We have established indices on: Intelligibility; jargon (shop talk); pacing; mnemonic input; application consistency; and the order of menu choices to name a few. We also ask questions on the number of steps to task completion; “being forced to listen” and a personal favorite: Turn-taking indicators and machine behavior.

For the Usability example question, we hoped to discover how users perceive the use of turn-taking indicators in voice response systems. We discovered that the four most popular turn-taking indicators are: Earcons (also called “ear candy” or beeps or music); Prosody (the “music” or inflection of speech); syntax (grammatical structure); and the use of Dialog Pauses. What was striking in the answers so far is that only 41.2 percent of respondents reported that “Dialog Pauses” were used as turn-taking indicators in the applications they called. I’ve concluded from this that there are a lot of voice response systems out there that just don’t shut up – so callers have to clue in to other indicators to know when it’s their turn to speak or hit touch-tones. This looks like a big area of concentration for designers of systems. Let’s face it – it’s really easy to insert pauses. In fact, in most cases, you don’t even have to re-record prompts or re-do logic flows to add silence. So I’m curious to find out – over time – why the use of silence is not more prevalent.


Interactivity deals with the overall transactional integrity of the system. This is a measure of how predictable and reliable a system is when it is performing dialog transactions, host and database lookups, managing caller input and performing error management. Good transactional integrity means that callers can succeed in getting their tasks done in a timely fashion (and with the correct results). Here, we deal with touch-tone and speech co-existence and fall-back to touch-tone and other input management issues. Also important are ease of validation and confirmation steps, error management, host response time and general delays to name a few.

Many of my clients are focused on task completion and transaction failures, since these are central to the issue of containment (the ratio between the number of callers who stay in self-service vs. opt-out to a customer service rep). So we’ve chosen a question that deals with task completion as an example. Here, we use a multiple-choice question: “Based on the transactions the system says it supports, were you able to get them done?” The answers were kind of shocking. Only 42.7 percent of respondents claim to have been able to “do it all” without hassles or re-tries. That means the majority of voice response systems are designed in such a way that the majority of callers experience errors or just don’t know what to do. This spells big opportunity for speech vendors because ASR-based systems, if designed properly, can be easier interaction-wise. To be fair, even a well-designed touch-tone system can allow callers to do transactions without retries. Especially for power users, who are typically partial to touch-tone systems.


A voice response system’s credibility deals with the level of comfort or “safeness” a visitor experiences with the system. Credibility means that users believe what they hear and can trust the information is correct and that the people running the voice response system are trustworthy. Credibility also deals with the seriousness with which callers take the system. This is a challenging area because so much of this information is qualitative and based on subjectivity. Nonetheless, we have established many metrics on the subject and mix a good bit of quantitative analysis with the qualitative.

Here, we tackle system persona, anthropomorphism (the imbuing of human characteristics onto an inanimate object), machine-centric versus caller-centric behavior, chatty discourse, politeness and tone of dialog. The results from surveying these issues are provocative. What’s really interesting is the prospect of “de-hyping” persona and getting down to what works and what doesn’t work. Other important aspects of credibility deal with the timeliness of information, consistency of voices (talent), diction, accents, gender, predictability and one of my favorites: dead ends or endless loops.

The example question here deals with apologies and blame. Callers were asked to give their impression on one scale (a score of one) whether the machine was overly apologetic or on the other (a score of 10), the machine seemed to blame the caller for mistakes. Surprisingly, most respondents recorded a score of five or above, which means that the majority of systems tend to blame callers for mistakes. To be fair, less than one percent recorded an answer of “10” so it’s not all that terrible. But what this suggests is that too many systems take out their own failures on the caller. Clearly, blame is no incentive for callers to use automation.


How You Can Participate in this Column

Just log on to the research portal at http://www.sterlingaudits.com/research.html. Sign up as one of our researchers. When you input the company name of the voice response system you wish to survey, put “STM” behind the name. The syntax: “ABC Incorporated STM.” This will allow us to distinguish surveys submitted by the readership versus the regular research staff. Submit a few of the companies you do business with as projects. Once approved, you’ll get a notice to go ahead with the survey the next time you log on. You get a $10 stipend for your trouble. The 10 bucks is nothing, but you’ll be part of supporting research we all can take advantage of.


Edwin Margulies is co-founder of Sterling Audits, a firm dedicated to quality improvements in customer service automation and contact centers. The company specializes in benchmarking the usability of self-service systems. As EVP and Chief of Research, Margulies is responsible for projects including the 2004 Web Site Usability Almanac and the 2004 Voice Response Usability Almanac. He is also on the board of directors of AVIOS, where he participates as the chair of the Marketing Committee. He can be reached at 702-341-0314 or ed@sterlingaudits.com.



SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues