2023 Vertical Markets Spotlight: Speech Technology in Consumer Electronics

Within the past decade or so, speech technologies have advanced quickly in the consumer electronics arena. What started as an interface that was available only on a select few smartphones is now available on just about everything. The biggest names in tech, including Microsoft, Google, Apple, Amazon, and Samsung, have taken speech mainstream on an array of devices that enjoy widespread use.

Consumers today expect to be able to vocally interact with their cars, refrigerators, vacuums, watches, washing machines, air conditioners, garage door openers, home security systems, and more. And the technology is expected to proliferate and improve, eventually becoming the preferred way consumers interact with the world around them.

To demonstrate just how widely these devices and systems have been gaining traction, Parks Associates found that 38 percent of U.S. internet-connected households own at least one smart home device, like a smart thermostat, door lock, or video doorbell. Smart TVs are now in 63 percent of internet-connected households, and nearly 40 percent of U.S. households own some smart security solution, such as an alarm system, smart doorbell, or smart door lock.

Research from IDC confirmed the trend. According to its data, the worldwide market for smart home devices swelled by 11.7 percent in 2021, and more than 895 million devices were shipped that year alone.

Among some of the more noteworthy voice-enabled introductions are the Yale Assure Lock, which integrates with voice assistants like Alexa or Siri, so people can unlock and lock doors via voice commands; the Haiku Ceiling Fan, which can be voice-controlled through Amazon Alexa and Google’s Nest; the Garageio garage door opener, which can be operated via voice remote control or an integration with Amazon Alexa; Vocca, a compact plug-and-play gadget that connects to any standard light bulb socket, transforming standard lights into voice-activated bulbs; and the Roboking Automatic Bagless Vacuum, which lets users invoke different cleaning patterns with voice commands.

Consumer electronics manufacturers need to regard voice control as a standard feature of everything they make. But while the technology has gained steam, it still requires a great deal of complex application development and systems integration work. Emerging standards promise to simplify the process and spur further adoption.

For device manufacturers, the effort is worth it. “Typical touch controls are not always practical,” explains Seth Sternberg, product manager for sensors and audio software at CEVA, a signal processing systems provider. “People cannot reach the pull chain on a fan switch. Also, sometimes, one’s hands are not available. They may be dirty or be holding a load of laundry. Voice offers a way to turn it on or off.”

Smart Hubs Provide Central Control

While many smart home devices still operate autonomously, there is interest in smart home hubs, like Alexa, that connect every device to a central system, so in essence the house runs itself. In the ideal scenario, a person would be able to pick up his smartphone on the way home and tell the system he’s returning. The lights come on, the thermostat kicks up a few degrees, and the shades open as he pulls into the driveway.

Simplicity is the main selling point of voice-activated products. Consumers don’t want to go through the hassle of learning a new remote or app, programming or training systems, or going through any number of other complex routines just to get their devices functioning properly.

While the potential benefits have been significant, fitting the pieces together has been difficult. Currently, a lot of the needed elements are available at the component level. As a result, consumer electronics suppliers have had to tailor their devices’ microprocessors to add voice user interfaces (VUIs). Chip vendors are filling different product voids. In essence, suppliers have been slowly building a new software ecosystem from scratch.

In January, CEVA, for example, tied its audio front-end software to Alexa Voice Service (AVS), Amazon’s cloud-based service that allows device makers to integrate Alexa features and functions into compatible devices. To further refine the voice functionality, CEVA added its own ClearVox far-field noise reduction and voice processing software and WhisPro VUI neural network-based keyword spotting software.

And in April, Renesas Electronics, a supplier of advanced semiconductor solutions, launched a 32-bit ASSP multi-control unit for voice-controlled human-machine interface systems. Targeting home appliances and toys, the new ASSP supports multiple languages and user-defined keywords for voice recognition operations. The solution is a building block for turnkey voice control solutions.

The many other consumer electronics suppliers would like to work at higher levels, take on less microprocessor integration work, and focus more on higher-end, customer-facing, differentiating functions. The problem is that each voice supplier developed its own way of recognizing speech. Therefore, every time a third party connects to its system, a lot of complex application integration work often has to follow. And then to bring on another device or operating system, the manufacturers often have to start from scratch.

This has slowed market growth. “While there’s plenty of growth to be had in the smart home market, there are still challenges ahead, [like] a lack of interoperability,” says Jitesh Ubrani, research manager for IDC’s Mobility and Consumer Device Trackers.

Standards would ease the burden of connecting systems, and that work has begun. Starting in December 2019, Amazon, Apple, Google, and the Connectivity Standards Alliances (formerly known as the Zigbee Alliance) founded the Connected Home over IP (CHIP) working group. They eventually created Matter, an open-source connectivity standard for smart home devices. The standard is based on the Internet Protocol and works through one or several compatible border routers. Matter products run locally and rely on Internet connections only to talk to the cloud. The system is designed to avoid the use of multiple proprietary smart hubs. A vendor writes software once and it works with any hub. In addition, it is intended to enable cross-platform use of smart home devices, mobile apps, and cloud services and defines a specific set of IP-based networking technologies for device certification.

Version 1.0 of the specification was published in October 2022 and supported lighting products; door locks; thermostats; heating, ventilation, and air conditioning (HVAC) controllers; blinds and shades; home security sensors (such as door, window, and motion sensors); and televisions and streaming video players.

Version 2.0 of the specification has been working its way to a final sign-off expected soon. The working group has focused support for robotic vacuum cleaners, ambient motion and presence sensing, smoke and carbon dioxide detectors, environmental sensing and controls, closure sensors, energy management, Wi-Fi access points, cameras, and major appliances.

While work on developing the standard has been moving along, delivery of compatible devices has been lagging. Solutions were expected to arrive in 2021 but were pushed back because of complications from the pandemic. The first wave of devices is expected to start arriving later this year.

Interoperability is not the only challenge, though. Many of the early voice-enabled consumer devices used spoken language that was translated into machine instruction by processors running in the cloud. That required these systems to be always on, listening for an input on which it could act. Consumers weren’t necessarily comfortable having devices that were always on, hanging on their every word. Such processes also created a lot of latency, as systems had to wait for signals to bounce around the ether before they could be processed.

However, with advanced algorithms today, much of the voice and audio processing is moving to what is referred to as the edge, away from the cloud. This has helped improve VUIs, remove some of the privacy concerns, and lowered latency for quicker, more accurate voice recognition. It has also proven to be more cost-effective in most cases.

At the same time, computing at the edge requires more processing power on the device itself, which has led to more selective wake-word detection to limit power consumption.

Another challenge: No matter the application, consumers will use voice commands only if the performance is consistently reliable. Delivering on that promise wasn’t always possible, given that many devices are used in environments with a lot of background noise, multiple speakers, and distances or obstructions between the speaker and the device. Speech technology vendors have responded with echo- and noise-cancelling technologies, more sophisticated microphones that can pick up and isolate audio signals, far-field speech recognition, and natural language processing that lets users speak more conversationally rather than limiting them to constrained sets of words or phrases.

With all the work that is going into improving speech for consumer electronics, it’s clear that device manufacturers today understand the value of adding voice activation features to their products. A number of niche solutions have emerged. But building up the ecosystem has been and will remain challenging. Standards will simplify the integration work now required to add such capabilities. How quickly it takes hold is a question that should be answered in the coming months. x

Paul Korzeniowski is a freelance writer who specializes in technology issues. He has been covering speech technology issues for more than two decades, is based in Sudbury, Mass., and can be reached at paulkorzen@aol.com or on Twitter @PaulKorzeniowski.

2023 Vertical Markets Spotlight: Speech Technology in Consumer Electronics

Smart Hubs Provide Central Control

Aircall Acquires Vogent

Grok Voice Mode Comes to Apple CarPlay

Krisp Launches VIVA 2.0, an Infrastructure for Voice AI Agents

DomoAI Launches TTS and Integrates OpenAI's GPT Image 2.0 in Talking Avatar Workflow