Why Hasn’t Speech Recognition Gotten Better?
I got in my 2016 Toyota Sienna recently and asked for directions to an address, and it responded, “You want directions to McDonald’s, is that correct?” I tried again, enunciating, and I was asked if I’d like to find a Best Buy near me. No! So here it is, 2016, and speech technology is nowhere near that imagined in the classic sci-fi flick 2001: A Space Odyssey (released 48 years ago). Where is HAL? Not that I want him to take over my car, or run for president. My question is: Why isn’t speech technology better?
Sometimes I feel I’d be doing a disservice recommending speech technology to customers. A number of large companies still have older contact centers, where agents use mainframe “green screen” 3270 sessions and there is no IVR screen pop, and hence, no caller information. It seems foolish to recommend investing in an IVR user interface rather than updating the underlying infrastructure. At the end of the day, technology is supposed to be an enabler, not an end in itself.
As designers, we need to be honest with our customers; they view only one or two pieces of the puzzle while we see the whole picture. Even when technology works as we imagine it should, it will not fix broken processes or infrastructure. Our client’s customers will not be impressed with a glitzy user interface if underlying systems are unable to access relevant information quickly and accurately.
At a recent meeting, I asked how many people used and liked to use speech technology. Only one out of the 12 managers did. When I asked the remaining 11 why they didn’t, they cited reasons such as a disinclination to say sensitive information out loud over the phone, or the technology not allowing them to speak to a human when they needed to.
These answers don’t reflect a problem with speech technology so much as a problem of perception. (Why do callers prefer to say sensitive information to an agent instead of a machine, I’ve often wondered?) The bone of contention is not the technology itself, but rather a result of the caller’s ideas combined with a company’s policies and procedures. It seems that in our intense focus on the user interface, we often neglect the process itself, despite the fact that that is what drives the customer experience.
The fault clearly does not lie entirely with designers, but while speech technology has been manipulated for use in various products and services, the actual recognition rates and base algorithms are not moving forward as rapidly as in times past. Yet the licensing, development, and deployment remain expensive, especially if you throw multiple language options into the mix.
So why hasn’t speech recognition improved as rapidly over the past few years? Is it simply because it’s perceived as being “good enough,” and people are willing to overlook its deficiencies? Have we become so complacent that now we’re satisfied with speech working part of the time? Or perhaps customer service has been bad for so long that any attempt to improve will make people feel we’re on the right track?
Over the past 10 years, the speech community has invested in marketing and sales but made little to no incremental enhancements of the technology itself. Whether in cars, in IVR systems, or on PCs, speech technology is simply not good enough to use all of the time. And even though Siri is a hit and Google is now a verb, high implementation costs prohibit either from being used by companies for their voice self-service systems.
Where does that leave speech scientists or VUI designers? How do we assist with the demand for better service? Some companies are going back to “press or say” options to offer speech without the expense. Directed dialogue and, for those companies that can afford it, natural language have also been successfully implemented. But even agent-assisted IVR, with its 100 percent recognition rate, comes up short, for it does not qualify as true speech recognition technology.
Anyway, back to my vehicle: I complained to Toyota about how poorly the speech technology performed, and they upgraded the software. I picked up the van and tested it out by asking for directions to my house. The Sienna responded by offering directions to McDonald’s! If the car insists on leading me to restaurants, I only wish it had better taste.
Vicki Broman is the manager of the voice user interface (VUI) design team at eLoyalty, part of the Customer Technology Services division at Teletech.
Female and male voices are perceived differently, even when they use the same exact words