The 2016 State of the Speech Technology Industry: Speech Engine
Last year, we reported that speech had finally come into its own as a consumer phenomenon. While most projections of the industry since the 2011 introduction of Apple's intelligent assistant, Siri, anticipate consistent growth for the industry, leading analysts suggest that the metrics for measuring that growth may change. The estimates that project a 22.07 percent compound annual growth rate (CAGR) for speech engines through next year cite speech recognition in mobile devices as a heavy influencer, an area that is increasingly accessory to more isolated business objectives.
"Only a handful of companies charge license fees based on something," says Dan Miller, the lead analyst and founder of Opus Research. "It used to be ports on an interactive voice response server. There are only a few companies still doing that. Now, it could be hits on a KPI. In contrast, Apple has a lot going on in speech recognition, and it's based on increased revenue and selling more phones. Google has tremendously accurate speech recognition used in a way that enhances their ad revenue."
On the other hand, Global Industry Analysts has recently forecast the face and voice biometrics market to reach $4.7 billion by 2020, due to a growing demand in defense, global banking, and healthcare. These estimates include industry leaders in stand-alone speech tech like Sensory Inc. and Nuance Communications.
With the major consumer brands having made their initial foray into the speech market, the suggestion going into 2016 is that while speech technology is increasing, its growth independent of other products or services may be slowing as larger companies that were previously not competitors—such as Microsoft, Google, and Amazon—continue to groom speech technology and data that is proprietary and self-focused.
"I see a little bit more penetration into the ambient home environment because Amazon Echo is a really capable device," says Deborah Dahl, principal of Conversational Technologies, who notes that speech is the ideal candidate to solve the glut of user interfaces in the consumer space: "As speech-enabled devices become smarter, the graphical paradigm is going to break down."
While Amazon has as of yet declined to release specific sales-to-date figures for Echo, its speech-enabled wireless speaker, the device was Amazon's best-selling item over $100 on Black Friday.
The Echo speaker is aligned with Amazon's proprietary IVR assistant, Alexa, which allows the customer to do Amazon-related activities ranging from listening to music in Amazon Prime to ordering from a customer's wish list. In addition, it can control various enabled devices from WeMo, Philips Hue, SmartThings, Insteon, and Wink, with a list of functions that continues to grow as developers utilize its application programming interface (API).
Alexa was likely the product of Amazon's 2013 acquisition of Ivona Technologies, and this acquisition-and-implementation behavior is precisely what is complicating the current speech market.
Both Dahl and Miller anticipate the next consumer speech market developments to center around issues of interoperability.
"It'll be interesting to see how these products play out against the Internet of Things," Dahl says. "The Echo is starting to be integrated into appliance control, and I'm sure the Jibo"—an Indiegogo-funded intelligent assistance robot spearheaded by the director of the Personal Robots Group at MIT, Cynthia Breazeal—"will be as well. But how are the ecosystems going to play out against one another? For example, Apple has a home environment ecosystem. Is a consumer going to get that? Are they going to integrate the Echo?"
Miller asserts that there will be a breaking point with proprietary blockades. "Some will be closed," he says. "Apple is very closed off. But you'll see sharing. For instance, Siri is going to need to talk to some other robots to make a dinner reservation, or to find goods and services. That's where things are kind of going."