Multiple-Modality Disorder

However, the application instructs the user to hold a button and constrains what utterances he might say by modeling possible queries: Say a business near you. Or traffic, directions, map, weather, movies. Tellme opted for this approach, according to senior product manager David Mitby, because even when "very solid smartphone users pick this up for the first time, it’s not clear what to do. It’s not intuitive media yet."

User interface designers creating a multimodal interaction have a particularly difficult time because they have no way of really knowing how a user will interact with a specific handset in a specific scenario.

Olvera recently worked on MediVoice, an e-prescription service powered by Nuance Voice Control Healthcare Edition that allows physicians to verbally create and send electronic prescriptions from a range of mobile handsets. With a GUI, the only way physicians can get information is visually. The need to press buttons on their handsets constrains their responses.

Stir voice into the mix, and "that’s where things get very interesting," Olvera says. "You have no control over the user."

During usability tests, Olvera discovered that some physicians treated the voice interface as a turn-taking situation where they would utter quick words, wait until they registered, then say more. Other doctors, however, would give long responses full of information—the patient’s name, the prescription, the dosage, the pharmacy where the patient should pick it up, and the phone number—all in a single set. Still other users would offer quick verbal information, then use the stylus to enter the remaining data.

"That’s when we realized [interactions are] very driven by context and the familiarity with the user’s situation and the technology of the devices themselves," Olvera says. "When you’re restricted to just a phone, it’s straightforward. Everyone knows how a phone works." On the other hand, not everyone knows how a smartphone works. The same can be said about a medical prescription application.

Nuance tested MediVoice in various hospitals and at various intervals throughout the country. Doctors already familiar with an online prescription system typically liked it when it was ported into the handset. "They liked the ability of being able to use it on their device without having to learn a new interface," Olvera says. Doctors unfamiliar with prescription systems tended to react to the mobile version as if it were a novelty act.

"With this particular application, it’s hard to pinpoint every single issue [in usability]," Olvera adds.

Reconcilable Differences
So how do designers reconcile different devices with different abilities? "With multimodal, the answer is: It depends," Olvera says. "It’s impossible to blindly decide if a VUI is better than a GUI." That’s why designers really need to understand that different types of information are better conveyed in certain formats. An obvious example: The ideal form for musical information is audio, whereas maps and location-based services are ideally visual.

Thus, multimodal interface designers need to be particularly canny when delivering functionality on a given application. "Functions will not map one-to-one between a phone and a multimodal interface," Olvera cautions. For instance, if an application has 10 functions, it might make sense to offer, say, six of them through a GUI or five through a VUI. Additionally, some of that functionality might overlap in each mode. "So there will be certain functions that work on one method [instead of] another one," Olvera says.

Functionality must be very targeted. Overcomplicate the device, according to Phillips, and users get confused. "We have a fairly rich set of functionality, but we don’t push it so hard on first-time users," he says. "There’s hidden functionality."

Mitby echoes the sentiment. "We could technically do a lot more with speech than we actually do," he says, "but we’d rather target it at these use cases that really matter and focus on problems where we can put a lot of resources instead of being a be-all-and-end-all with speech. We want it to really work for a smaller set of tasks."

For Ghanekar, that meant identifying key pain points that still exist on mobile devices, namely, text input. He found that users don’t want to type long, complex queries, and voice was the natural fix for that problem.

Ultimately, voice is a tool in the same way a mouse attached to a computer is a tool. "To be honest with you, I’m a little concerned with pushing multimodality as an end to itself," Meisel says. "That’s like technology looking for a solution." The issue, he explains, is figuring out the best way to deliver satisfaction to a customer to help him achieve his given objective.

Yet, it’s more common to see a consumer interacting with a device through a typepad or touchscreen than by voice. Users are still hesitant about speech thanks to years of bad deployments and shoddy technology. Mitby confesses that when he started at Tellme eight years ago, he always felt slightly embarrassed to tell people that he worked in the speech industry. However, he’s also discovered that as interfaces have improved over the years, consumer confidence has grown. Indeed, Microsoft announced in May that its software, previously exclusive to Ford Sync, will be loaded into select Kia and Hyundai cars by November. It’s rumored that the Kia and Hyundai systems will come with additional features, with some bloggers speculating about voice controls for navigation devices and security features.

"[Speech technology] needs adoption, and people need to see it works for tasks that matter," Mitby says. Ultimately, he believes that if there are minor problems, but an application provides consistent value, the user will be forgiving.

"We’re in a sweet spot where the technology is good enough to produce very good applications that are usable by the average person," says Todd Emerson, director of solutions engineering for Medio Systems, which partnered with U.K.-based speech recognition provider Novauris to design Verizon’s Get It Now service, allowing enabled phones to download certain applications. "As people start using these applications more, the consumer will get smarter about how to use the application, and the technology will get smarter about how to work with the consumer."

Who’s in Control?
Designers occasionally also have to consider carrier control. On North American feature phones, vendors need permission from the telcos to access application programming interfaces, like recording audio. That’s why vlingo Find, for instance, is currently enabled on Sprint phones. There’s more openness when designing applications for smartphones.

"Now the problem is: Who’s servicing you?" Olvera states. "AT&T? Sprint? Each one has different bandwidths in different regions with different data transfer speeds. What works in one device in one location may not work for another device in the same location."

Previous Page Next Page

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

IVR Is the Right Prescription for This Pharmacy

TalkRx keeps small, independent stores competitive with retail giants.

10 Sep 2012

Multiple-Modality Disorder

IVR Is the Right Prescription for This Pharmacy

Voice Deepfake Fraud Surged 1,300 Percent

Sanas Unveils Simultaneous Real-Time Speech-to-Speech Translation

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API