The in-car speech experience is moving from basic command and control to natural interactions with automotive assistants
When a vehicle has internet connectivity through a smartphone or an embedded modem, its speech recognition becomes a whole new experience; asking the system to, say, steer you toward nearby parking becomes a realistic option. Connected cars can leverage cloud-based speech recognition, such as Apple or Google’s speech technology, as well as off-board content and intelligent processing such as natural language understanding (NLU) and reasoning technology. Cars without connectivity, on the other hand, rely on factory-installed embedded speech recognition, which seems positively limited by comparison.
Embedded speech recognition started appearing in vehicles during the early 1990s, and most new vehicles are equipped with it. Initially used for voice dialing, embedded speech systems have evolved to support music management, navigation, and various vehicle features such as climate control. But if you pay attention to recent J.D. Power user satisfaction scores for embedded speech, you get the impression that simple speech commands are difficult for the technology to recognize.
The main reason for the low satisfaction scores is that in-vehicle recognizers often hear unexpected audio, which leads to unexpected things happening. Drivers are expected to speak structured commands, as outlined in the owner’s manual—which many drivers ignore. For example, a driver can simply say a radio station number to change channels. But if the driver says an address without saying “address” first, anything can happen. Why force drivers to follow rules that have to be learned? Smartphones, by contrast, are designed to handle naturally spoken requests.
Historically, embedded speech has faced the following constraints: Usage behavior cannot be monitored and audio is not logged; the vehicle’s CPU processing and memory is quite limited, even today; infotainment systems continue to evolve and become more complex to control; and updating the recognizer is not practical after a new car is sold.
Embedded speech recognizers can’t adapt to the driving population’s voice patterns, which is a significant disadvantage in terms of optimizing the user experience. The speech algorithms and supporting statistical data are limited due to hardware constraints. Once installed in the car, the recognizer is frozen. The result: limited performance—especially at handling out-of-grammar commands. Yet speech is important for safety and drivers want it, and to be fair, after learning what to say and a little practice, most people have no problem using such systems.
Cloud-based automotive speech recognition became a reality a couple of years ago when cars were launched that supported Apple’s CarPlay and Google’s Android Auto. These infotainment solutions rely on the driver’s smartphone for the app and connectivity and also on the vehicle’s display. Both solutions offer a better navigation experience than best-in-class embedded vehicle infotainment systems because people are used to their intuitive smartphone experiences. The overall cloud speech experience goes far beyond what drivers have become accustomed to in the car. Not only does the cloud enable highly accurate, very adaptable speech recognition, it also provides access to content and supports dialing, texting, full navigation, music management, messaging, and Siri-like assistance. The user interface is suitable for driving and speech input is key to the experience. Neither CarPlay nor Android Auto rely on embedded speech recognition; without internet connectivity, though, the speech function is lost.
The user interaction style for cloud speech is natural—quite different than the embedded speech style. You never hear the classic car prompt: “Please say a command—beep.” Instead, the user hears a chime to start the experience; if needed, audio prompting without beeps is used. The natural interaction style should dramatically improve user satisfaction scores.
Earlier this year at the Consumer Electronics Show, Nuance Communications announced that its Dragon Drive—an embedded-cloud speech recognizer with NLU—had launched in BMW’s 2016 7 Series. The hybrid speech solution features a conversational user interface that enables intuitive access to in-car functions and connected services while minimizing distraction. Ideally, a hybrid speech solution offers seamless access to all services and content—no matter where you’re driving. But the vehicle has to stay updated with relevant infotainment information, such as nearby points of interest and other navigation information.
The trend is clearly toward cloud speech, but what about the embedded speech solutions? It would seem that with some clever user interface improvements and focusing only on managing the phone, navigation, and music, the user satisfaction scores could be improved. But let’s face it: We can all expect Google to be the infotainment system as the connected car ultimately cruises toward autonomous driving.
Thomas Schalk is a speech technologist who focuses on automotive user interfaces at SiriusXM. He can be reached at firstname.lastname@example.org.