The 2015 State of the Speech Technology Industry: Speech Engine
packages for PCs like Dragon NaturallySpeaking, and server-based speech recognition for call centers."
LumenVox, Scholz points out, also grew from only selling TTS to offering ASR and now a full speech product suite. Interactions now focuses on natural-language sales, service, and support solutions across multiple device types using type, touch, or talk. IntelliResponse—recently acquired by 7—offers a virtual agent solution for customer self-service that can handle both text and speech input. Aspect has undergone a similar transformation with its Mobility Suite, which incorporates visual IVR, text self-service, text-to-SMS (short message service), and callback.
"These and other similar companies highlight the trend in speech technology toward integrated solutions where the core technologies, like ASR, TTS, speaker verification, natural language, and dialogue management, no longer appear in the advertising headlines, replaced instead by products that solve real-world problems," Scholz says.
The trend has also worked in reverse, however. Miller says the current speech engine marketplace has morphed into a "competitive landscape that is fragmented beyond recognition."
This, he says, is due to several realities. The first is specialization by vertical, with enterprises looking for very specific solutions that address their very unique problems. The needs of healthcare providers for transcription and document management, for example, are very different from retailers seeking rapid resolution of customer care or sales issues.
Further fragmenting the industry is the need to support multiple mobile engagement models, Miller says, noting, "Shrink-to-fit efforts to embed voice processing and related applications on mobile devices or automotive electronics remain a challenge."
Even in the best of cases, Miller adds, "the notion of a miniaturized speech engine is a bit misleading," and the most robust solutions rely on cloud-based voice processing and information processing resources.
Cloud-based consumption models have further fragmented the industry, Miller maintains. "The inexorable move of customer care, voice search, and intelligent assistance applications to the cloud puts a crimp in near-term revenue recognition," he states. "Rather than buying an engine and paying on a per-port or per-server basis, vendors are finding that their customers and go-to-market partners will pay by transaction or, in some cases, on a success basis."
Going forward, speech engine vendors will need to better support developers with easy-to-use and well-documented application programming interfaces and software development kits, according to Dahl. Developer programs, such as Nuance's NDEV, are also important in enabling developers to explore speech technology without the high cost, she says. "There are many ways to apply speech technology, and a lot of them, such as accessibility applications, are extremely valuable, but they address markets that are too small to justify professional services engagements."
But even Nuance, clearly the largest and most influential speech engine vendor, needs to do a better job supporting mobile application developers, Dahl contends.
Other top vendors include Google, which Dahl says, "has very good speech recognition technology" that is accurate and supports many languages. "However, Google seems to be very focused on the use of speech recognition for only a few applications, specifically Web search and free-form recognition," she says. "Recognition for the kinds of focused tasks that normally require a grammar or a statistical model will be less accurate."
Microsoft also was credited with having accurate speech recognition technology, but experts are concerned that its desktop and server versions don't appear to have been updated recently. Microsoft’s newer efforts in speech technology appear to focus on the mobile market.
One newcomer to the speech industry is Amazon, which is investing heavily to bring speech to the home environment with its Fire TV and Echo, a voice-enabled music player, speaker system, and personal assistant.
Most of the company's recent speech developments have come from its January 2013 acquisition of Ivona Software, which provided the text-to-speech and voice guide features on Amazon's Kindle Fire tablets. Industry watchers expect even more acquistions.
In our first State of the Speech Technology Industry issue, we reveal the latest trends and developments in eight market categories.
Companies and Suppliers Mentioned