The 2017 State of the Speech Technology Industry: Speech Engines
While Dahl notes that these researchers have only been able to achieve this benchmark in the laboratory and through very specific data sets, Microsoft could become an exceptionally attractive provider of speech solutions if it decides to offer this technology through APIs. Its current market domination of the operating system and business applications market might make it an attractive out-of-the-box alternative to some other solutions.
In an interesting footnote to the crossover between speech technology mainstays and consumer platform providers, Nuance also released Dragon Anywhere for mobile devices this year, providing users on the move with a more accurate and customizable dictation engine than what might otherwise be baked into their pocket technology.
Getting Off the Cloud
As cloud-based speech engines approach human-like capability, enterprises are starting to look for speech technology they can deploy onsite, and developers are looking to include speech in ways that can maintain operation regardless of cellular or Internet connectivity.
“Now we’re starting to see speech in embedded offerings,” says Dahl, picking out Sensory and Nuance’s Dragon NaturallySpeaking as noteworthy performers.
Sensory approaches the speech engine market first with its TrulyHandsFree product, a keyword module with up to 95 percent accuracy at detecting user-defined keywords. The software is embeddable via the company’s NLP-5x chip and works in Android environments or with Sensory’s FluentSoft software developer kit. The chip and its associated software can be trained with specialized vocabulary and can operate in high-noise environments. With TrulyHandsFree, Sensory offers TrulyNatural, a natural language processor that can absorb up to 1,000 words per megabyte of memory; it is also embeddable offline.
Nuance continues to enhance its offerings, with version 15 of its Dragon NaturallySpeaking engine due for release this year, adding deep-learning capabilities to already robust voice command and transcription software. This deep-learning capability improves Dragon’s accuracy with use. It can even scan emails to search for common phrases or specialized vocabulary, delivering the kind of insight and accuracy normally enjoyed only with cloud-based computing.
As the markets for speech proliferate and expand, more companies are looking either to provide speech engines or to benefit from their use. In addition to the opening of proprietary speech engines from companies like Amazon and Microsoft, even notoriously cagey companies like Apple are broadening their ecosystems to embrace the emerging Internet of Things. While this environment of exciting competition offers more options than ever before, clients looking to use speech engines to enhance their operations or to add speech to their own offerings must have a clear sense of their needs, Dahl cautions:
“I think the biggest obstacle for clients is an embarrassment of riches—there are so many options for speech engines available today. Clients should understand that there’s no one-size-fits-all approach to finding a speech engine. Different engines are going to be right for different problems, so by far the most important first step is identifying their needs. They should ask questions like what languages are needed, how much customization is needed, and whether the application can handle sensitive data that would rule out a cloud solution, among many others.”
Beyond the engine itself, there are considerations like costs, time frames, the stability of the vendor, and the availability of development tools that should all be taken into consideration. But know that speech has arrived, that it’s gotten really good, and that it’s only going to get better as time goes on.
Tye Pemberton is a freelance writer based in Savannah, Ga. He can be reached at firstname.lastname@example.org.