Multiple-Modality Disorder

Article Featured Image

Fall on your knees for the graphical user interface (GUI)! No longer need we hack through a thicket of C-colon-backslashes to access a single file. Today we simply point and click. Or we would if only we could find the proper icon amid our desktop’s clutter. So now there’s a new problem: As the functionality of our electronics increases, our ability to comfortably interact with them decreases. When was the last time you purchased a cell phone without an instruction manual roughly the size of War and Peace? For that matter, try decoding all the colorful little hieroglyphs appearing in your word processor toolbar.

"To me, the beauty of a graphical user interface when it first came out was you could, without an instruction manual, figure out how to navigate somewhere," says Bill Meisel, president of TMA Associates. "Now you go to Microsoft Word, they get clever. They’re trying to stretch the GUI beyond where it’s really useful." While we’re familiar with GUIs because of their ubiquity, it’s exasperatingly obvious that a single graphical modality isn’t always the optimal way to access an application within a program. It’s even more difficult on a mobile device, given the limited screen size.

Nuance’s Voice Control, which layers audio commands over BlackBerrys, Treos, and Windows Mobile devices, was designed to streamline the often-unwieldy controls of smartphones. By depressing a push-to-talk button, users command their phones to find businesses, make calls, or access their calendars, without having to wade through menus and submenus. "The interface advances and innovations that are happening around the mobile phone are incredible right now, and you will find over time that voice will play a large role, an increasing role, as those interfaces begin to expand and grow," says Mike Thompson, vice president and general manager of Nuance Mobile.

That certainly sounds rosy, but there’s a lot about the way humans interact with the confluence of visuals and voice controls that still hasn’t been worked out. "In the area of multimodal, there’s still a lot of design work that needs to take place in order to make multimodal applications truly easy to use," says Bill Scholz, president of the Applied Voice Input Output Society (AVIOS).

Incredible advances in user interface design might be happening right now, but it’s impossible to overlook the nascence of multimodality and the fact that its interest corresponds with American consumers’ increased mobility and need to stay connected.

Mobile Search
During a keynote at the CTIA Wireless convention in Las Vegas in April, Yahoo! Mobile president Marco Boerries announced the voice enablement of mobile search product oneSearch 2.0. "You just say anything, and we’re going to deliver you the answer," he pledged. OneSearch’s voice application, centered around vlingo’s Find technology, is currently available on the BlackBerry Pearl, Curve, and 8800 series. It has a deceptively simple interface: a single search box in which the user’s utterances appear. If there’s a misrecognition, the user can identify where it occurred and a drop-down box gives him alternative choices, or he can change it by highlighting the mistake and either typing or speaking over it.

"I’m extremely impressed with the quality of the dialogue design at vlingo," Scholz says. "Rather than take voice-only dialogue design and graft pretty pictures onto it, they’ve tried to start from ground zero and re-create a human-computer dialogue mediated by a multimodal handset that really makes sense and makes it easy to use."

One of the early requirements when designing the application, according to vlingo CEO Mike Phillips, was to place text and voice entry at the same level, thereby allowing users to control the application with little training. In other words, Phillips and his team leveraged the way people commonly search on the Internet as well as the way they interact with their handheld typepads. "You don’t have to change [user] behavior that much," Phillips says. "It just gives them one more way to interact."

Yet, while users can input data into the application in fundamentally the same way they would with a Web browser, the way the search results are displayed must be altered to suit the vastly smaller real estate of a handset screen.

"We’re working within the constraints of that system," says Joy Ghanekar, global product manager for Yahoo! oneSearch. "This is where the power of giving an answer instead of Web links comes in." For instance, a user wanting a Red Sox score will get the score on the search results page, instead of being directed to a Web site that may or may not show up properly on the screen’s browser. "Same thing with business listings," Ghanekar adds. "We give the address and phone number."

Of course, Web browsing is relatively standardized. Search engines all maintain a similar format, and versions of Microsoft’s Internet Explorer and Mozilla’s Firefox comprise the majority of Web browsers actually in use. But much more variability exists across mobile devices, which makes providing a consistent user experience an ordeal.

Consider that the application environments for mainstream handsets might be either the Java-based J2ME or the C++-based BREW. Smartphones, BlackBerrys, Windows Mobile devices, Palms, and—with the launch of its software development kit—Apple’s iPhones all boast different operating systems. And with the variety of handsets comes a variety of hardware and firmware.

"I think the biggest challenge is that you begin to realize the impact the devices themselves have," says Eduardo Olvera, a senior user interface designer at Nuance. Some handsets, for instance, have push-to-talk functionality. Other devices require the user to press a button once to start recording and press it again to stop recording.

In April, Microsoft subsidiary Tellme announced a location-based search application for AT&T and Sprint that it started shipping on select Blackberrys and feature phones (the popular industry distinction between feature phones and smartphones is that the latter have a more powerful operating system, which provides a steady platform for application developers). Tellme’s bold slogan—Say it. Get it—seems to promise similar functionality as Yahoo!’s oneSearch. Like oneSearch, the design is minimalist.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

IVR Is the Right Prescription for This Pharmacy

TalkRx keeps small, independent stores competitive with retail giants.