The 2016 State of the Speech Technology Industry: Speech Engine

Article Featured Image

Speech API

Speech recognition now goes hand-in-hand with natural language understanding, and development on both fronts in 2016 will be a matter of Big Data.

“The accuracy rate of speech recognition in just the past sixteen months is better than ever before,” Miller says, asserting the quantum leap that speech tech has been making as the corpora of utterances grow.

“There’s going to be a shared effort across some of the Silicon Valley giants,” he continues, citing, along with Dahl, Facebook’s acquisition of Wit.ai, an API that allows developers to build voice-activated interfaces. Facebook has kept the API open, in contrast to VirtuOz, a former voice-command virtual assistant from Wit.ai cofounder Alexandre Lebrun, which was sold to Nuance in 2013.

Dahl also mentions Microsoft LUIS (Language Understanding Intelligent Service), the IBM Watson Cognitive Computing Suite, and Api.ai as software developer kits to watch as independent programmers look to add speech technology to their applications.

She notes that there is an emerging niche for offline speech recognition. “Sensory is really the leader in that area,” she says. “They’ve addressed the market and moved into larger vocabulary applications.” Both Sensory and smaller German company Linguwerk specialize in accurate speech recognition with low memory and power requirements.

Intelligent Enterprise Assistance

While the consumer market for speech finds its way, enterprise intelligence systems continue to develop. Analysts at this year’s Gartner Symposium predicted that by 2018, 45 percent of the fastest-growing companies will have fewer employees than smart machines, with businesses looking to scale up without adding new workers, paving the way for further developments in IVR and natural language understanding.

Miller asserts that about 15 percent of enterprise intelligence systems in 2015 already have automated speech to some degree, and that percentage will grow in 2016.

“You’ll see vendors like NextIT, Creative Virtual, [24]7, and Intelliresponse selling some kind of conversational resource that can answer questions when customers or prospects call a contact center, doing what IVR used to do in terms of supporting automated customer assistance,” he says, singling out industry-leader Nuance as a front-runner. “Nuance is going to use NinaWeb and NinaMobile as a differentiator. Both products are natural language and support chat, but they’re primarily speech.” Dahl agrees, citing Openstream’s EVA as a likely candidate for businesses adding intelligent assistance to their customer interaction strategies.

Miller also highlights Interactions, a CRM provider that has acquired AT&T Watson, which includes full automatic speech recognition capability, text-to-speech, and a voice biometrics engine.

Other Developments and Predictions

"Part of the reason that things like Echo and Siri are getting so much better so quickly is because computing capabilities are starting to make deep learning more attainable," Dahl says. "In the last few years computer scientists have returned to the idea of neural nets, except now they're able to make them much more layered between the inputs and the outputs. It makes it much easier to train systems. The inputs that they're trained with can be simpler, whereas they used to have to be very laboriously put together. In the case of speech, people had to transcribe the utterances for the training data."

Dahl also notes the emergence of State Chart XML from W3C. "The applications it's good for are basically controlling dialogues. I'm hoping that once it's finished, it's going to be able to improve dialogue processing."

Miller sees voice biometrics evolving from a proprietary authentication technology into something more integrated. "For instance, in Apple's iPhone 6s, Siri's wake-up is personalized. Once I've said 'Hey, Siri' to it three times, I can wake it up with IVR, but my wife can't. So that's the beginning of the use of something like voice biometrics as personalization. The long story will be your voice as a stronger citizen of your identity bringing along with it all your entitlements and building a trusted link between you and whoever or whatever is on the other side. That's going to create a very fertile area for successful digital commerce." 

Tye Pemberton is a freelance writer based in Linwood, N.J. He can be reached at tyepemberton@gmail.com.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues