This is the second in a series of interactive columns where the STM readership can participate in auditing voice response systems using a Web-based research portal. In this issue, we explore sample data based on research conducted by the readership of STM and the Sterling Audits staff since the last issue. Next time, we will highlight a company who scores high in the Sterling Audits Usability Index.
40,000 Answers to Survey Questions So Far
First, my thanks to all of you whove participated in the research initiative which is the engine of content for this column. Over the past two months, we have garnered the answers to over 40,000 individual questions which are part of the Sterling Audits Usability Index and associated Web-based research portal. STM readers have made a significant contribution to this effort. Many of you heard about the research portal based on the open call for researchers, but a large number of our researchers have joined owing to professional curiosity. (It sure cant be the paltry $10 stipend we pay for each completed survey).
As a quick review, the Sterling Audits Usability Index is the industrys first open and non-proprietary methodology for scoring and weighting the usability of self-service systems. Here, we concentrate on voice response, although the survey instrument covers web site usability as well. The critical success areas we focus on are: 1) Navigation; 2) Content; 3) Usability; 4) Interactivity; and 5) Credibility.
Each of these critical success areas represent dozens of usability-oriented questions. Each of the five areas is weighted to be equal to 20 percent of a perfect score of 100 percent. No system so far has scored 100 percent, but thats not really the point. The point is to use the Index to benchmark your voice response system against your peers systems. This data can be used to make usability improvements which, in turn, can give you some uplift in efficiency and customer satisfaction.
In this issue, I have taken one example from each of the critical success areas to share with you. This reveals the level of granularity and depth of the survey. I show here only five questions out of over 200, so this is by no means comprehensive. Therefore, the purpose of this issues column is to inspire you as a reader to get involved in the research initiative.
The Navigation questions in the Sterling Audits Usability Index deal with the way callers are guided through the call logic flow of a voice response system. You Navigate based on input modalities such as Navigation Keys (touch-tone) or Navigation Words (speech commands) or a combination of the two. The scripting of the application is the words and sentences that are spoken to you by the machine. There are many factors that go in to a quality experience navigation wise. These include, to name a few: The usefulness of persona in navigation; menu structure, length and related short-term memory issues; the use of automatic speech recognition; and multimodal input (touch tones and speech).
There are also Navigation questions dealing with operator access, the use of standard navigation keys and navigation words; seasonal prompts and special alerts; default treatment logic; and over-dialing or barging-in over system prompts. As with all of the survey questions, a score is tabulated based on how these items are used and most importantly how they affect the caller.
The example Navigation question we have chosen deals with callers short-term memories. Specifically, whether or not callers had to hang up and call back to get their work done or whether the system provided a crisp, easy-to-remember list of choices. Its telling that only 8.4 percent of respondents reported that they were
able to remember all of my choices easily (a top score of 10 on a 1-to-10 scale). In another recent survey conducted by Sterling Audits, we discovered that 18.8 percent of voice response systems use speech. This was based on 270 randomly selected voice response systems. This means speech is no panacea for human memory challenges.
Content deals with the informational make-up of a voice response system. This deals with the depth and breadth of a systems capabilities to automate otherwise agent-guided tasks. So in this context, the content of a voice response system defines its robustness. It all comes down to a measure of how useful the options are and if the information you are looking for is accessible. For example, a retail banks voice response system with no account balance function would be regarded as being light on content.
Other topics covered in the surveys content questions include the use of certain words and their meaning; asking for instructions and help; and the use of extraneous messages. I chose a simple question for the example here which deals with how useful the selfservice experience was content-wise to the callers. On a scale of 1-to-10 where one is I couldnt really do much with the system at all and 10 is: The system allowed me to complete all the transactions I needed without an operator we got some pretty interesting results. In short, about 25 percent of callers scored systems in the range of five or below. This means that these callers did not get everything done that they needed and had to speak to an operator to do so (or only got partially what they needed). Of course, there are business rules that dictate how much can be done automatically. There are practical reasons, too, why you might not want to automate everything. But whats interesting is the callers perspectives. The question is posed in such a way that respondents are answering based on their perceptions of usefulness not the perception of the enterprise. Im hypothesizing that far more functions can be automated than are currently being tried. I also hypothesize that callers may be more tolerant of automation than we thought when we embarked on this research initiative.
Usability is a measure of how easy the system is to use. Of course, what weighs heavily is the human side of the equation. That is, how people judge the ease with which they are able to get useful work done. Users will judge the overall usability of a system based on their perception of its efficiency and their overall satisfaction with the experience. We have established indices on: Intelligibility; jargon (shop talk); pacing; mnemonic input; application consistency; and the order of menu choices to name a few. We also ask questions on the number of steps to task completion; being forced to listen and a personal favorite: Turn-taking indicators and machine behavior.
For the Usability example question, we hoped to discover how users perceive the use of turn-taking indicators in voice response systems. We discovered that the four most popular turn-taking indicators are: Earcons (also called ear candy or beeps or music); Prosody (the music or inflection of speech); syntax (grammatical structure); and the use of Dialog Pauses. What was striking in the answers so far is that only 41.2 percent of respondents reported that Dialog Pauses were used as turn-taking indicators in the applications they called. Ive concluded from this that there are a lot of voice response systems out there that just dont shut up so callers have to clue in to other indicators to know when its their turn to speak or hit touch-tones. This looks like a big area of concentration for designers of systems. Lets face it its really easy to insert pauses. In fact, in most cases, you dont even have to re-record prompts or re-do logic flows to add silence. So Im curious to find out over time why the use of silence is not more prevalent.
Interactivity deals with the overall transactional integrity of the system. This is a measure of how predictable and reliable a system is when it is performing dialog transactions, host and database lookups, managing caller input and performing error management. Good transactional integrity means that callers can succeed in getting their tasks done in a timely fashion (and with the correct results). Here, we deal with touch-tone and speech co-existence and fall-back to touch-tone and other input management issues. Also important are ease of validation and confirmation steps, error management, host response time and general delays to name a few.
Many of my clients are focused on task completion and transaction failures, since these are central to the issue of containment (the ratio between the number of callers who stay in self-service vs. opt-out to a customer service rep). So weve chosen a question that deals with task completion as an example. Here, we use a multiple-choice question: Based on the transactions the system says it supports, were you able to get them done? The answers were kind of shocking. Only 42.7 percent of respondents claim to have been able to do it all without hassles or re-tries. That means the majority of voice response systems are designed in such a way that the majority of callers experience errors or just dont know what to do. This spells big opportunity for speech vendors because ASR-based systems, if designed properly, can be easier interaction-wise. To be fair, even a well-designed touch-tone system can allow callers to do transactions without retries. Especially for power users, who are typically partial to touch-tone systems.
A voice response systems credibility deals with the level of comfort or safeness a visitor experiences with the system. Credibility means that users believe what they hear and can trust the information is correct and that the people running the voice response system are trustworthy. Credibility also deals with the seriousness with which callers take the system. This is a challenging area because so much of this information is qualitative and based on subjectivity. Nonetheless, we have established many metrics on the subject and mix a good bit of quantitative analysis with the qualitative.
Here, we tackle system persona, anthropomorphism (the imbuing of human characteristics onto an inanimate object), machine-centric versus caller-centric behavior, chatty discourse, politeness and tone of dialog. The results from surveying these issues are provocative. Whats really interesting is the prospect of de-hyping persona and getting down to what works and what doesnt work. Other important aspects of credibility deal with the timeliness of information, consistency of voices (talent), diction, accents, gender, predictability and one of my favorites: dead ends or endless loops.
The example question here deals with apologies and blame. Callers were asked to give their impression on one scale (a score of one) whether the machine was overly apologetic or on the other (a score of 10), the machine seemed to blame the caller for mistakes. Surprisingly, most respondents recorded a score of five or above, which means that the majority of systems tend to blame callers for mistakes. To be fair, less than one percent recorded an answer of 10 so its not all that terrible. But what this suggests is that too many systems take out their own failures on the caller. Clearly, blame is no incentive for callers to use automation.