March 8, 2004
By Robin Springer president, Computer Talk
Voice Value

It's All About the Caller

In this electronic era with wireless PDAs, e-mail and the Internet, where on the urgency scale is your telephone as a must-have? According to the Gartner Group, 92 percent of business transactions are completed over the phone, so it’s pretty high up there. Call centers handle as many as 55 million calls each day internationally, with live agents accounting for as much as 70 percent of the cost of the call. Decreasing the cost per call by minimizing live agent involvement can decrease call cost from as much as $15 per call to less than a dollar a call but this needs to be accomplished without compromising the perception of customer service. ScanSoft estimates that its customers who deploy ASR see ROI in an average of 9.5 months, with an average savings of more than $1 million. Nuance reports similar savings. The statistics sound great. ASR saves time. ASR saves money. ASR is good. But what are users saying about the technology? I asked a few Los Angelinos what they thought about ASR. With responses ranging from, "(I) don’t use it," to, "I’ve never had to talk to a robot," I tasked these individuals with calling at least one company that employs ASR and then spoke with them again. Interestingly, the actual experience was more positive than the perception, possibly because the users did not differentiate between IVR in general and ASR in specific, recognizing real human agents as one category and everything else (IVR, ASR, DTMF, Robot, etc.) as the other My test group liked or would like to see the following features in future iterations of ASR:

Explain what users can expect. For many callers this is their initiation into ASR and we all know the importance of a positive first impression. Describe the process briefly and provide a "skip" prompt for users who want to bypass the introduction.
Make the opt-out obvious. It doesn’t have to be the first thing we hear from name-your-robot, but it’s nice to know it’s there and, if we know how easy it is to access we may press on, or speak on in this case, to attempt completion of the call.
Include profanity in the grammars. If the caller is swearing at a virtual operator, he is not happy. Route him to a live agent. The same goes for prosody.

Offer the option of using voice or touch tone input. For the user who likes to multi-task; tracking an order with Dell Computers while eating lunch at his desk, it would be helpful to have the option. This also serves as an additional access mode for users with disabilities, who may have difficulty navigating a system that employs ASR without other input options. In general, providing multiple input options will enable systems to accommodate the widest range of disabled users.

Current systems got high marks for calculating postage from the U.S. Postal Service, verifying departures and arrivals from airlines, and transferring to an extension by saying a name instead of pushing buttons on the keypad. When the tasks become more complex, such as navigating to the correct department of a corporation, the communication breakdown appeared to result from poor call flow not from inadequate recognition.

For example, in this day and age is it really so difficult to route a call without making users disconnect? If users are asked whether they want Department A or Department B, it’s not endearing to be instructed to hang up and dial another number. And, if your "robot" is going to remind callers to say help if they need it, don’t send them back to the main menu to start the labyrinth from scratch. Affected speech poses a problem for ASR systems and often for human listeners. Michael Cohen of Nuance Communications says that while it is possible to factor affected speech into training data for an ASR system, significant challenges would be inherent in designing a system to integrate distortion relating to affected speech. If a system is going to recognize affected speech, variations of affectation must be considered—the effects of speech disabilities may vary quite a bit between users. And, as one population is oversampled, the recognition rates of other populations may be compromised. What about a system determining whether a caller has affected speech? There could be two or more acoustic models. When individuals call in they would be routed to the acoustic model that is most similar to their speech patterns. Today, users do not have the option of creating acoustic models specific to the idiosyncrasies of their voice. But perhaps some day we will have this capability, enabling users to upload their speech files to a network for use navigating ASR. Implementation of ASR is still relatively new. As we accumulate experience in interface design and as laggards adapt to the new technology, the ongoing exchange of information between designers and users will further refine the products and improve usability.

Robin E. Springer is president of Computer Talk, a consulting firm specializing in the design and implementation of speech recognition and other hands-free technology services. She can be reached at (888) 999-9161 or by e-mail at info@comptalk.com.

It's All About the Caller

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

VoicePing Releases VoicePing 3.0

Voiskey Officially Launches

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

The Voice Can Sound Right, and the Video Can Still Be Wrong

DeepL Acquires Mixhalo

Canary Speech Partners with NeuroLexIQ

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Voicelyt Launches Voice Score

DXC Partners with ElevenLabs

Deliverect Partners with SoundHound AI

Fish Audio Raises $52 Million in Seed Funding

Nabla Launches Dictation for Mac

OrcaRouter Launches OrcaDub