Consumer Voice Apps and Smart Devices Lead the Way for Speech Tech

Article Featured Image

The annual Consumer Electronics Show (CES) in Las Vegas serves as a barometer of what’s new and exciting in the consumer-tech world and what the focus areas will be in the near term. This year’s CES was no exception, but for the first time, there was a dedicated full-day conference on voice technology and applications; key players in the voice assistant segment, including Amazon’s Alexa, Google Assistant, Apple’s Siri, Microsoft’s Cortana, and Samsung’s Bixby, were represented. Clearly, this is a big sign of the interest and maturity of consumer voice applications.

The voice app and smart device makers seem to be following the approach of “let’s voice-enable devices and then let users figure out whether they’ll take to them.” Individually, some of the new products and applications may seem “meh” or “cool,” but collectively, they represent an exciting phase of innovation for voice and speech technologies.

So what were some interesting announcements? Google Assistant will be able to read out long-form content (e.g. web pages), and optionally translate it as well. Cortana already does something similar; using machine learning it summarizes the content before the out-loud read. Such functionality is definitely handy, but one of the challenges has been the monotonous, drone-like quality of the digital assistants. Mainstream adoption of voice applications will definitely increase if the voices start to sound more natural and expressive, more humanlike. Progress in artificial intelligence (AI) is enabling this, but much work remains in making digital assistants sound more realistic in different contexts. Many consumer companies would also like to customize the digital assistants to reflect their brand voice and values, and that can also help to improve user engagement and increase adoption.

The trend of voice-enabling smart devices continues at a fast clip, as smart speaker companies have announced more partnerships and integrations. From cars to gaming consoles to microwaves to washing machines to toilets to TVs to mattresses and more, companies have demonstrated that you can issue voice commands and operate these devices. Taken to its logical conclusion, we can expect most devices to be voice-enabled in a few years. Devices will also start talking to each other (“voice as API”).

I expect smart devices to be both voice-enabled and provide a display screen as a fall-back mechanism or as additional convenience for users; we will have a combination of voice-only, display-only, and hybrid (voice plus display) use cases. There’s still a ways to go in understanding how voice interfaces work, how voice and visual interfaces work together, the different user interaction models, and the best practices of interaction design. For example, with all the improvements in voice technology, people have started recording voice notes on their mobile phones to send to friends via text or instant messenger. Just think about the user interaction model here.

But of course the prospect of being surrounded by always-on listening devices during your waking and even sleeping hours is troubling. It raises many questions about user privacy and security. One small but promising development has been “voice-based deletes,” being able to ask your voice assistant to forget or delete what you’ve just said. Here, again, a lot of work needs to be done in terms of reassuring users about voice apps’ data collection, data processing, and data storage practices.

Many shiny and new voice apps have emerged for the popular platforms (more than 100,000 skills for Alexa), but the typical user will struggle to discover and use them (unlike with mobile app stores). I won’t say that discoverability for voice apps is broken because we’ve never had a discoverability paradigm to begin with. We are figuring it out as we go along, and that’s par for the course for emerging technologies.

To be sure, there are challenges to be solved—figuring out the real use cases, perfecting user interaction models, addressing privacy and security concerns—but we are approaching the inflection point with voice applications. If you are a marketer or a consumer brand, jump right in and start experimenting with how you’ll use them in your business workflows and customer journeys.  

Kashyap Kompella is the CEO of rpa2ai Research, a global AI industry analyst firm, and is the co-author of Practical Artificial Intelligence: An Enterprise Playbook.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues