Beyond Speech in the Car

Widely accepted is the notion that speech interfaces fit nicely into the driving experience, particularly when a task requires text entry. Speech can be used to manage secondary tasks such as navigation systems, music, phones, messaging, and other functionalities—making it possible to be more productive while driving without the burden of driver distraction. However, actual usage of such speech enablement has fallen short of expectations, spurring some to blame the low usage on the unreliability of speech in the car. Regardless, keeping the primary task of driving in mind, user interfaces for secondary tasks shouldn't require lengthy and/or frequent eye glancing, nor very much manual manipulation. The ultimate goal is to provide natural interfaces simple enough to allow the driver to enjoy technological conveniences while maintaining a focus on driving.

In today's vehicles, a speech button is commonly used to initiate a speech session that may have visual and manual dependencies for task completion. Once the button is pushed, a task can be selected from a voice menu. And without doubt, the trend is toward freely spoken speech, with no boundaries on what a user can say. But even with such advanced capabilities, perhaps a speech button is still not a good idea. Arguably, a visual-manual interface should be used for task selection, and with appropriate icon design, the user experience would be natural. Navigation, music, messaging, infotainment, and other functionality can be easily represented with icons.

"Please say a command" is an unnatural prompt, yet common today. From a multimodal perspective, we consider speech, manual touch, and gesture to be input modalities (from the driver to the vehicle) and visual (including heads-up displays), sound, and haptic (touch) feedback to be output modalities (from the vehicle to the driver). Sound can be an audio prompt that should be natural. Examples include "Where would you like to go?"; "Please say your text message"; "What's your zip code?"; and "Say a song name or an artist." And yes/no queries can certainly be natural and necessary.

A voice menu can be thought of as an audio version of a choices list. An audio interface can be cumbersome to use for list management, because each item in the list has to be played to the driver followed by a yes/no query. Complex items take longer to play and are more difficult for the driver to remember. However, using a visual-manual interface for list management is much quicker, easier, and natural. Consider the following use case:

1. Driver taps the navigation icon.

2. Vehicle asks driver, "Where would you like to go?"

3. Driver says, "An Italian restaurant that's not too expensive."

4. Vehicle displays top five search results, including address, distance, and price category.

5. Driver glances briefly.

6. Driver taps to select restaurant.

A similar scenario can be given for managing music and infotainment. In this case, the best practice may be simply highlighting the tapped result to avoid the tendency to stare at the screen during the audio playback. Long eye glances are dangerous, with a maximum of two seconds being a critical limit for safety.

Of course, we still need gestures. Controlling volume with speech is silly when you think about it. It is much more natural to use gesture by turning a knob or pressing and holding a button usually found on a steering wheel. We use gesture as an input modality every time we drive—to steer, to accelerate, and to brake (sorry, speech won't work for these tasks). So we should avoid using speech instead of gesture when fine motor skills are required.

Speech in the car should be approached holistically. We should think in terms of a cognitive model for secondary driving tasks—a model that will indicate the best use for speech and other modalities. Simple tasks such as voice dialing can be done with an audio-only interface, combining speech and sound. But when tackling more complex tasks, we can't expect a one-size-fits-all interface. ?

Thomas Schalk, Ph.D., is vice president of voice technology at Agero, a company that provides telematics services to the automotive market. He is a member of the AVIOS board of directors and a former president of the organization. He can be reached at tschalk@agero.com.

Beyond Speech in the Car

Do In-Car Voice-Enabled Devices Distract Drivers?

Honda and Acura to Integrate Siri

Nuance's Dragon Drive Voice-Enables Hyundai's Next-Generation Connected Car

VoiceAssist Is Now Available for the iPhone

QNX Launches In-Car Speech Recognition Framework

EB Brings Advanced Navigation And Speech Features To VW Golf VII

DentScribe Launches DentScribe Perio Charting 3.0

Krisp Launches Voice Translation v3

Treble Technologies and Hugging Face Benchmark ASR Models

Why Better Client Tracking Starts With Better Capture of Spoken Clinical Interactions