The Internet of Things: A New Evolution for Speech Technology

Those who have followed this publication since its inception in 1995 know that, for many years, speech recognition has been the speech market’s overwhelming preoccupation. Primarily with applications in speech-enabled IVRs, medical transcription, and assistive technology, speech recognition has been a powerful enhancement to B2B, CRM, and consumer technologies. However, mere recognition—once the focus of industry research and development—is rapidly becoming old news.

Several reasons mark this reduction in emphasis. The excitement when it seemed speech might be the next evolution in consumer user interface is over, eclipsed by the era of the context-sensitive touch screen. Old problems persist: biometrics, security, responsiveness. However, the key reason speech recognition is fading from industry consciousness is a positive one: It is simply no longer a problem.

With transcription engines like the Nuance NTE clocking real-time factor speeds up to 10x, with 70 percent to 90 percent accuracy, the line between the machine and human abilities to “hear” speech accurately is very thin indeed. Companies using top-tier speech recognition technology find more that they can focus on other issues like natural language understanding and intelligent assistance. Additionally, companies using lower-cost solutions are getting more bang for their dollar as a wealth of data that has accumulated in the cloud reaches the critical mass necessary to make machine learning functional for even modest speech vendors.

Old venues for speech recognition are diversifying. CRM, which has traditionally leveraged speech for use in interactive voice response (IVR) systems, has been increasingly exploring other avenues to customer engagement. These include mobile FAQs, chatbots, and hybrid “visual IVRs,” which combine self-service web interface with live-agent calling as a last resort. Although the traditional, speech-enabled IVRs persist, these new conduits for customer engagement often bypass the node where artificial speech is most invested.

Where does this leave the speech market? What are the next frontiers?

Speech in consumer technology has been on the rise alongside the rise of the smartphone. Apple has popularized the idea of speech as interface with its intelligent assistant, Siri, as have Microsoft with Cortana, and Google with Google Now, to name a few. The difficulty with this market is penetration: If you owned a smartphone, it likely came with its own speech recognition engine, its own transcription, its own baked-in intelligent assistant, all of which send speech and utterance data back to a proprietary cloud. If you currently own speech-enabled appliances, chances are they have a speech interface independent of your smartphone’s intelligent assistant: different wake words, different levels of accuracy, different commands, and multiple user profiles.

These devices, the Internet of Things—everyday objects such as cars, phones, refrigerators, sound systems, and industrial and B2B machinery connected to the internet—comprise the next big market for speech, and the next market is a confluence of recognition, artificial intelligence (AI), and Big Data. But the companies that have popularized speech technology in consumer consciousness are the very companies that have been undermining its ability to take hold. While we are living in a golden age where the consumer expectation for speech technology is growing, those same consumers have been caught in a complicated battle between companies that view their speech tech primarily as an incentive for other products and services. Those same companies have little incentive of their own to pursue a truly universal speech interface for products outside their “ecosystem.” This fragmentation has prevented the Internet of Things from being more than a consumer novelty. To date, the real power of the Internet of Things has been handicapped on the way to reaching its full potential.

But things are changing.

Speech as the Premier Interface

“Currently the vast majority of interfaces for the Internet of Things are apps,” says Deborah Dahl, principal at Conversational Technologies. “But as you acquire more appliances that are connected to the Internet of Things, you find you consequently have more apps to control them all—if you want to take advantage of all they have to offer. That’s more user interfaces to learn, more menus to navigate, screen after screen to drill down into. It’s untenable.”

The Internet of Things: A New Evolution for Speech Technology

Speech as the Premier Interface

DeepL Launches Voice-to-Voice

Sanas Acquires Tomato.ai

SpeakON Launches MagSafe AI Button

Deepdub Introduces Agentic Dubbing Co-Worker