Voice-First Bots and the Future of IVR

Article Featured Image

Lately I’ve been fielding lots of questions about interactive voice response (IVR) platforms. As background, my firm, Opus Research, was one of the few technology analysis firms to pay constant and consistent attention to IVRs during the years when email and then, in rapid succession, e-commerce websites, mobile apps, and chat all arose to capture and divert their share of calls.

Over the past two years, “conversational commerce” has taken on a new meaning as bots on messaging platforms made their presence known. These text-based tools are hardly conversational, and yet they have captured the hearts, minds, and communications habits of Millennials and “Centennials” (or members of Generation Z, the kids who have come of age this century), who prefer to take command of digital domains with their digits.

And yet, voice-based conversations continue to grow. Some of them, admittedly, never make it to an IVR. When an individual summons Siri, Alexa, or Google Assistant with the respective wake-up words, they initiate actions that, more often than not, are completed without “going off-hook” (i.e., initiating a phone call). Each of these intelligent assistants (which Opus Research often refers to as “metabots” because they are not front-ending an individual brand or dedicated contact center, as your average IVR does) is able to answer questions, hail a car share, book a table at a local restaurant, or buy a movie ticket.

Among the 10,000-plus skills developed for Alexa and the thousands of skills developed for Google Assistant are conversational commerce offerings from banks, hotel chains, retailers, and makers of consumer packaged goods, which have moved beyond mere trials and integrated voice-first bots into their mix of e-commerce channels.

Even though these voice-first conversations do not bring call volume to existing contact centers, we are told that the minutes logged on IVRs and live agents continue to show impressive growth. How can that be? When will the bubble burst? How does a voice app developer or customer service representative prepare for the inevitable shift in demand from spoken words to text and to interfaces that include selections from rotating carousels, radio buttons, or emojis?

The good news is that the growth of metabots has made voice-first technologies and speech processing cool again. I, for one, no longer have to explain what I’ve been doing with my life for the past 10 years in terms of the “computer that answers the phone and talks to you when you call your bank, cable company, wireless carrier, or airline,” only to hear the inevitable “I hate those things.”

For every person who complains how Siri never recognizes what she says, or who says he’s run out of new things to ask Alexa, there are dozens more who say their kids spend the evening playing games over the Echo, or who find that “OK Google” are the first words out of their mouths when they get up and want to hear the news, plan their day, or turn on the lights in the kitchen. They’ve been conditioned to expect successful outcomes from their conversations with machines.

That sort of success somehow makes speech-enabled IVRs and the “How may I help you?” interface familiar, tolerable, and useful. The results are measurable. Contact center administrators, CX professionals, and other keepers of the IVR report fewer hang-ups and higher levels of containment as callers can talk to IVRs using their own words to describe what they want and then get things done.

What’s more, the older generation of speech technology professionals has forgotten more about dialogue design, turn taking, tagging, and recognition of intent than the tens of thousands of bot developers have yet learned. As they solved the problems of “speech-enabling” IVRs near the turn of the century, they defined the conventions and best practices associated with good voice user interface design. Today’s new crop of conversational interface design specialists can benefit from the wisdom that was gleaned designing for much more primitive, single-purpose computing environments.

Those pioneers can literally go back to the future, dust off their Ph.D. theses with their talk of neural networking, computational linguistics, and natural language understanding, and feel the rush that comes when we know that our extra years of education have not gone to waste.

And now digital commerce activity has come home to roost in the IVR—or, more accurately, the voice app on a generic server or the ASR/TTS API on some third-party cloud. Far from reaching the end of their useful lives, IVRs are providing the comfort food for the digital generation. When you think of a friction-free interface, free of special controllers or devices, what comes to mind is someone or something that you talk to and that understands you. In spite of the proliferation of devices and modalities, the speech-enabled IVR has proven to be just that. 

Dan Miller is the founder and lead analyst for Opus Research.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Speech Processing or Natural Language Understanding? You Be the Judge

Both are important, but in today's increasingly voice-first world, which one is more so?