Speech Technology Magazine

 

Companies Will Need to Make the Most of #VoiceFirst

Voice will become the interface of choice in industries far outside consumer electronics, and the possibilities are endless.
By Dan Miller - Posted Apr 23, 2018
Page1 of 1
Bookmark and Share

This year, much of the discussion in the halls of the Consumer Electronics Show 2018 could easily have taken place at a SpeechTEK Conference. That’s really saying something!

Finally, electronics manufacturers, appliance makers, and automakers join the makers of mobile devices and phones to ensure that hundreds of millions of intelligent devices can understand and take actions based on spoken words. 

The phenomenon was dramatically validated in August when Kleiner Perkins analyst and venture capitalist Mary Meeker cited the expansion of voice in her “2017 Internet Trends Report.” She credited its popularity to the growth of voice search on mobile devices (attributable primarily to Google Voice Search). This, in turn, was traced to a general improvement in the accuracy of speech recognizers, which is destined to get even better with the introduction of faster media processors and the broader implementation of deep neural networks, machine learning, and artificial intelligence.

Meeker noted that smartphone sales were starting to show signs of peaking while those of Amazon Echo devices were starting to take off. Indeed, research conducted by Edison Research and NPR indicates that in the space of two and a half years, Amazon has managed to place Echos in 11 percent of U.S. homes; Google Home–branded intelligent speakers, meanwhile, are in another 4 percent of homes.

Based on these results, Bret Kinsella at Voicebot.ai has determined that there are 39 million smart speaker owners in the United States, delivering a media reach on the order of 100 million individuals who routinely turn to such voice-first devices not just for Web search but for shopping, entertainment, suggestions, or conversations. Because the core platforms for all of these services come from technological behemoths that include Facebook, Apple, Samsung, and Microsoft, in addition to Google and Amazon, the range of voice-first interactions and transactions has already expanded well beyond mere consumer electronic devices.

In the past, I described local conversational platforms as “speech-enabled devices.” Having speech processing reside on devices at the edge of wireless networks has made wake-up words like “Alexa” or “Hey Google” possible. But the magic starts after the speech-enabled devices do their work. “Voice-first” (or #VoiceFirst on Twitter) is a much more accurate term because our voices are a mere gateway to a broad set of services, conversations, and transactions that have video components, may involve screen-sharing, and, in the very near future, will embrace augmented and virtual reality.

Amazon’s Echo Show, a voice-first video device, has already inspired Google to introduce a direct competitor. Facebook is reportedly developing its own voice-enabled video device called Portal, which it plans to announce in May. Lenovo is bringing out a smart display. The ranks of providers are poised to grow.

And one of the messages from CES is that the automobile is next. Apple’s CarPlay has made Siri a standard for voice control and search in GM, Honda, Hyundai, Kia, and even Ferrari vehicles. Ford Sync has handled voice control of entertainment and other functions since 2009, but Ford and all the other automakers have made it clear that voice-first services in cars are ready to move to the next generation. 

One of the biggest challenges will be providing a consistent interface and experience across all of the #VoiceFirst devices people use in the course of their day. We are only at the beginning of defining what that means, both from the user’s perspective and from the point of view of companies that are defining engagement models—ones that are aware of each customer’s activities and preferences over a customer journey spanning many flavors and varieties of #VoiceFirst services, devices, and contexts. 

Page1 of 1