Consider the User and the Device

NEW YORK (SpeechTEK 2008) -- The 2008 conference wrapped up Thursday with the final sessions of SpeechTEK University, a series of three-hour seminars on speech technology topics.

In one afternoon session on integrating speech and graphics interchangeably in mobile devices, Julie Underdahl and Catherine Zhu, both senior interface designers at Intervoice, led participants through the process of creating a pizza-ordering multimodal application as they explained their journey in trying to create similar mobile applications.

"Today what we’d like to do is show you where we are in our research and share our successes and failures," Underdahl said. Their research centers on creating an application where VUI and GUI are active at all times—the user can choose which modality to use at any time.

"There are different flavors of multimodal applications," Zhu said, "but what we’re focusing on today is a full multimodal experience."

One thing to remember when focusing on that full multimodal experience is the three separate user groups—people who prefer to use VUI, those who prefer GUI, and those who like both.

Advantages of using dual modality would be a potential increase in usability because the user can interact with the modality that’s more relevant at the time, so if the environment changes and he needs to switch modality, he can.

"Some disadvantages are that you also may have user confusion because they don’t know which modality to use or if they can use both or not," Zhu said. "They may just get overload."

Zhu and Underdahl admit that in some ways it seems like a solution looking for a problem. "A big reason we’re here today is to talk about the design of these and the thought processes that go into the design," Underdahl said.

One thing to remember, they said, is that speech is transient as opposed to GUI, which a person can look at as long as they want to. Also, in designing such applications, one can’t design a program for one device—they have to remember to think about all of the different devices that this could be used on.

Zhu and Underdahl showed several demos that they used in various places, such as retail stores. They had people test the applications and then talk about what they liked and didn’t like when it came to using the applications.

In one application demo, people were told to go through the process of buying a Bluetooth headset. The multimodal prompt said, As you can see, there are four Bluetooth headsets that will work with your phone. I’ll go over them one at a time. When you find the one you want, just say or touch ‘buy it.’

However, when hearing someone say this to them, many people thought the prompt was saying to touch by it, so the user would touch near the picture or text instead of touching the words buy it.

"It gets complicated and you can confuse your users in ways you don’t expect," Zhu said.

Those in attendance then created a multimodal application for ordering pizza. They made a prompt name for the greeting and main menu of the application, then wrote what the application would say both to the user and on the screen.

At the end of the seminar, Underdahl noted that error handling "is really one of the bigger challenges in these designs. One of the things we thought about is that people might have a higher error tolerance because they have other ways to input their choice."

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues