If this year’s Consumer Electronics Show in Las Vegas in early January was any indication, speech technologies have indeed gone mainstream. With the continuing evolution of mobile devices and the Internet of Things, consumer uses of speech recognition, voice search and control, voice biometrics, and voice-enabled virtual assistants are now all the rage.
Just about every speech-related press release that came out of this year’s CES in some way mentioned Amazon’s voice-activated assistant Alexa, which Amazon seems to be distributing as broadly as possible to developers of all types and sizes. And the development community has responded. In a very short amount of time, the number of available Alexa “skills”—the applications that enable consumers to interact with devices and companies in a more intuitive way using voice—has grown to 6,000, up from slightly more than a hundred at the start of 2016. Consumers can now invoke Alexa to do everything from turning up the heat in their homes to starting their cars, from accessing the latest stock quotes to ordering pizza or booking a ride with Uber.
Internally, Amazon’s own developers have also been hard at work improving the service, expanding Alexa’s conversational skills and natural language processing capabilities, enabling it to better understand what users want without having to ask as many clarifying questions before providing a response or acting on a request. Amazon’s developers are also in the lab trying to expand the number of command categories, or intents, that users could throw at Alexa. There were originally only 15 intents, like help or search, but Amazon would eventually like to move that number into the hundreds, which it says will make it easier for independent third-party developers to build Alexa skills.
Amazon isn’t the only company providing a developer feeding frenzy. In general, the development environment for speech applications and the integrations that will enable consumers to do more with speech is very robust today. That segment of the speech market has grown so dramatically in just the past two or three years that we felt it was necessary to add development platforms as a whole new category to this year’s “State of the Industry” report. This new category supplements the updates we have provided for the past few years about speech engines, speech analytics, interactive voice response systems, assistive technologies, and interactive virtual assistants.
In each of these categories, the developer community has grown very active of late. Speed and accuracy issues that were problems in the past with each of these technologies have largely been overcome. Technologies now act in near real time, word error rates have been effectively reduced to about 5 percent, and even noisy environments, which have always presented a problem for speech applications, are becoming less of an issue as companies have perfected techniques and technologies around microphone placement, noise cancellation, echo and reverberation reduction, and more. Speech interfaces today are far more viable, and usable, across a wider variety of domains.
The focus now is on artificial intelligence, machine learning, and neural networks. The hype around those technologies will go on unabated as vendors work to add them to more of their basic speech products.
Another area of focus for vendors this year will be making their software more conversational, which we can only assume will further expand the uses for speech in even newer, more exciting, more powerful, and more meaningful ways.
2017 will be an exciting year for the speech technology industry. There’s never been a better time to be a software developer, and that isn’t likely to change anytime soon. We can’t wait to see what those labors will yield.
Leonard Klie is the editor of Speech Technology magazine. He can be reached at firstname.lastname@example.org.