5 Predictions for Voice Technology in 2023

Article Featured Image

Miles Davis once said, "I think the greatest sound in the world is the human voice." But in the smartphone age, we've largely abandoned voice and learned to communicate with our fingers. First, we sent messages through a complex, multi-tap dance across the conventional telephone keypad. Then QWERTY arrived on the touchscreen and the ease of input grew exponentially, unleashing a non-stop flow of thoughts and commands. Since then, our necks forever craned to the screen.

There is no doubt that voice is the most natural and convenient communication mode, so it's little wonder that the adoption of voice technology on smart devices has more recently become the preferred interface in many contexts. Research estimates that 142 million people in the United States or 42.1 percent of the population, will use voice assistants this year. And while much has been written about the slower-than-hoped uptake of voice as an e-commerce engine in recent months, the fact remains that as many as 33.2 million U.S. consumers are expected to shop using the voice search feature on their smart speakers this year.

Smart device manufacturers have been fast to integrate voice as an interface across a multitude of touchpoints, both at home and on the go. Companies that provide content, commerce, and communications are adapting to this rapidly evolving landscape, too. In 2022, we saw category leaders like Sonos launch new voice assistants to establish a more direct relationship and access point with their customers, in some cases taking back control over their brand identity from some of the general assistants we know and love. Looking ahead, it's clear that voice will continue to replace text and screens as the interface between end users and companies in many contexts. Below are five voice technology predictions that leading companies and device manufacturers should leverage to strengthen their competitive position and unlock new growth opportunities in 2023:

1. Consumers will expect consistent voice experiences across multiple devices.

As consumers increasingly choose voice as a way to interact with their favorite companies, they're going to want to use voice technology with the myriad of apps on their phones without unlocking their devices and have those voice interactions sync across every smart device they own, including their cars and TVs. End users will expect a high quality voice experience that gives them access to top functions they already have within their favorite apps. These experiences won't be complex or conversational, but rather simple and reliable interactions that involve asking for an action to be taken or for a piece of information. Looking beyond the next one to three years, voice experiences will become more complex, engaging, and intelligent, anticipating what end users want and allowing them to combine multiple actions into a single, fluid request (e.g., "Send a car to pick up the kids from soccer and let them know it's on the way.")

2. Voice will be the primary input mechanism in AR and VR experiences.

The most natural and efficient way to interact with the real world is with our eyes, ears, mouths, and hands, and this applies to our digital world, too. Just as using a mouse in a three-dimensional space is tedious and awkward (as opposed to simply pointing your finger), typing is taxing and cumbersome when moving through the digital world. While spending significant time in virtual reality and augmented reality worlds isn't in our immediate future (for most of us anyway), when the time comes end users will untether from not just screens, but also from the mouse and keyboard, as this will allow for more natural, efficient interactions.

3. Audio entertainment will increasingly be augmented by voice.

Audio experiences like music, news, and podcasts, continue to grow as a medium of entertainment, and they're ideal use cases for voice assistant technology. As consumers are increasingly able to get what they already want from audio devices faster and easier with voice, we'll see these voice experiences evolve further. In the coming year, news, music, and podcast services will allow end users to speak directly to them and access a deeper experience. New interactive formats will emerge like karaoke, choose-your own-adventure, and interactive ads. Just as consumers can use their fingers in the Spotify app to access their favorite playlists or liked songs, for instance, they'll increasingly look to access rich, personalized content via their own voices.

4. Voice will become a preferred interface for purchasing known items.

Voice technology will continue to change how consumers shop, primarily when it comes to shopping for known items or items they order regularly. Exploring a new item requires more information than consumers want to absorb through audio, so these types of shopping experiences will continue to leverage visual displays as the primary means of communication, with voice as the input. Multimodal devices best support integrated visual and audio experiences like this, and since every speaker will have microphones and internet connections, these products can be transformed from only delivering audio to two-way voice interfaces. In particular, we'll see the restaurant industry develop more voice ordering capabilities, especially given continued labor shortages in that sector. Expect to see more voice interfaces available at drive-thrus and in popular food delivery apps in the near future.

5. The voice technology stack will become more democratized.

Low-power edge voice recognition will be critical in the coming year. The technology will begin to integrate into the primary system on a chip (SoC) of audio devices, saving the cost of having to add an additional chip. We are trending to a future where natural language understanding (NLU) technology will also become more specialized by sector or vertical, with one or two NLUs emerging as the de facto solution for common voice-based experiences like ordering food, banking online, or interacting with music. Overall, it will become less expensive and much easier for companies to rent specialized voice stacks as opposed to building them from the ground up.

While the adoption of new technologies is often unpredictable, every indication is that voice is tracking to become a primary interface for common experiences like watching TV, interfacing with electric cars, ordering food, and listening to podcasts in the near term. Voice as an interface is an important early step toward ambient computing, and already we're witnessing consumers increasingly interact with more and more compute power via more natural, heads-up experiences. Companies and device manufacturers have a timely opportunity to support the growing demand for voice technology, and in doing so, deliver engaging, productive experiences that more seamlessly integrate into consumers' daily lives.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues