The Future Is Full of Possibilities
As we’ve done now for the past few years, Speech Technology magazine is again dedicating its first issue of the new year to a preview of what’s to come in our small corner of the world. We’ve highlighted the six technology areas where we see the most impact: speech engine, speech analytics, voice biometrics, virtual assistants, speech developer platforms, and assistive technologies.
These six categories cover just the top business use cases; we barely scratch the surface of what is happening with speech in the consumer market. From Toyota’s introduction of Amazon Alexa into select Toyota and Lexus vehicles and Flex’s Augmented Reality smart glasses with voice recognition to Delta’s Voice Activation Faucet, speech technologies were certainly on full display at this year’s Consumer Electronics Show, the gathering in Las Vegas every January to showcase every new gadget, gizmo, device, and piece of circuitry imaginable. It seemed that every exhibitor at CES 2018 had on display some product with integrations to Amazon Alexa, Google Assistant, Apple’s Siri, Microsoft’s Cortana, or any combination of the four. Most of these products are already available; others are soon to be released.
The 2018 CES proved voice can be used to control everything from the fridge and the clothes dryer to the toilet, carbon monoxide detector, and lawn sprinkler system. Voice truly is creeping into every aspect of day-to-day life, and nothing is off-limits.
In the business world, speech is nowhere near as ubiquitous, and its impact isn’t as dramatic, but the use cases are also on a sharp upward trajectory. In each of the six categories we covered, industry insiders are predicting robust growth, at least for the next few years. For speech and voice recognition and biometrics, analysts predict nearly 20 percent compound annual growth during the next five years, and all of the other categories are pacing toward double-digit growth as well. Even among speech developer platforms, analysts maintain that the building blocks are now firmly in place and applications are sure to take off.
But none of this can, or will, happen on its own. Analysts were quick to point out that speech technologies are not without their challenges. Cost is certainly at the top of the list, which is especially consequential for the people who rely on assistive technologies to communicate with the rest of the world. According to the World Health Organization, only about a tenth of the disabled people globally who could benefit from speech technologies can actually afford to use them. Advocates for the disabled are pushing for governments to pick up some of the costs of assistive speech technologies, but the industry clearly has work to do as well.
The speech industry has made great strides in tackling the accuracy issue; with improvements through machine learning and artificial intelligence, solutions do what they are supposed to more than 90 percent of the time today, up from somewhere around 70 percent just a few years ago. There is still work to do there, especially in the voice biometrics area, where even one small mistake can result in serious financial losses. Ninety percent accuracy, while impressive, still leaves room for improvement.
The industry has also made huge advances in natural language processing, moving far from the robotic interactions of the past to more conversational dialogues. If the ultimate goal for the speech industry is to mirror human-to-human conversations—and most would argue that it is—then the industry still has plenty of work to do there as well.
And, as a final chore for 2018, vendors should work on expanding the languages that their systems support. Even the most urbane of vendors can only support a few dozen languages; nearly 7,000 are spoken around the world. That, too, leaves a gaping hole to be filled.
The speech industry has plenty of reasons to be both proud and optimistic. Opportunities abound, but so do some pretty significant obstacles. Luckily, none of them seem insurmountable.
Vendors woo third parties and have success integrating speech into more business applications