May 3, 2010
By Melanie Polkosky Human Factors Psychologist & Consultant - IBM/Center for Multimedia Arts (University of Memphis)
Interact

The Guitar and the Case of Opportunity Lost

For Christmas, I bought my husband, an unrivaled air-guitarist since our high school days, a popular guitar-playing videogame. It has an air of incredible realism—from strumming the guitar, to watching the animated band play, to being scored by the hypothetical crowd. In all, the game presents a totally immersive, highly motivating user experience. But what prevents complete suspension of disbelief is a frustrating set of menus for navigation. Every time we turn it on, we’re confused about where to go. Seriously? Scrolling menus? Where is the speech that would allow me to talk to my agent and navigate? Why wasn’t he telling me that last song was way past my skill level and I’d never fill an arena that way?

For all of the fun we’ve been having, the game without speech seems like a major opportunity lost—one that I’m reminded of every time one of us straps on the guitar.

This got me thinking about where I wish speech technology was that it isn’t (or at least I haven’t found it). Where do I really want to be able to talk to an object or hear it talk back to me for better usability and usefulness?

So after polling a few friends and considering my own daily existence, here’s my Ultimate Speech Wish List:

1. My alarm clock: It seems so, I don’t know, 1940s to have a staticky radio station jolt me into my daily reality. I wish my alarm clock had a gentle voice that would nudge me awake like Mom did when I was a kid: Melanie. Melanie. It’s time to wake up. “QUIET!” I’d snap. OK, just 10 more minutes, the clock would say, without a trace of judgment that I’m not jumping out of bed.

2. The shower: Shower, 92 degrees, massage. Trickling water to full spray, staccato beats on the wall, pause.…Your shower is ready now. Need I say more?

3. My calendar: Maybe what I really want is a nagging personal assistant who could keep me approximating my crazed daily schedule. Hard stop in 10 minutes to take Alex to school, or you’ll have to reschedule your conference call.

4. My mobile running software: I have an iPhone application that tracks my runs, telling me how far I’ve gone in a monotone, oddly concatenated voice: TEN kee-LOM-ee-ters…EE-le-VEN kee-LOM-ee-ters. It shows my route map, distance, and time when I’m finished. I’d quite simply adore this app if the voice were a little more enthusiastic and maybe even, you know, motivating: Wow! TEN kilometers! Only three more to go! Great job!

5. TV recording and schedule: I’m totally over scrolling through hundreds of channels and thousands of shows to find the one or two I actually want to watch. I’d rather just tell my TV what to record. “American Idol, skip the commercials,” I’d say smugly. While I’m at it, I want speech for picking movies on pay-per-view, too.

6. Quick daily updates: I like to get quick updates throughout the day about what the market is doing, and how many email and voicemail messages have come in. Instead of clicking all over, it would be nice if I could get quick (yeah, I mean quick!) updates on all of these things in one sweep: The Dow’s down 103 points. Fifteen new emails. No voicemails. Update over, back to work.

7. My freezer and refrigerator: I just completed the periodic big sweep of the mystery bricks in my freezer. I’d love a sympathetic-but-admonishing voice to talk to me about it, like, Sorry, but those green beans are no longer edible; it has been six months since you put them in. It’s a science project now. Please discard.

My critical point is this: With the heavy focus on speech in cars and call centers, it seems as if we’re overlooking far more useful places where speech could collapse the hierarchical menus that we trudge through every day or just give us an easier way to control things that now require levers and buttons. In real life, I generally don’t care much about being productive while I’m driving. I just want to get where I’m going alive.

From a designer’s perspective, the commonality among my wish list items is they all are highly goal-directed and achievable within about three turns of dialogue, have small vocabularies, and are static and repeated frequently. All of these features make them perfect candidates for incredibly successful speech interaction. Yes, maybe they’re boring, but so what? They wouldn’t be once speech is meaningfully incorporated; they’d be awesome! So I’d like to challenge designers and the entire industry to seek out those mundane, everyday spots of human-thing interactions that we’re currently overlooking. In the meantime, I’m going back to practicing the guitar and writing a script for my agent.

Melanie Polkosky, Ph.D., is a social-cognitive psychologist and speech language pathologist who has researched and designed speech, graphic, and multimedia user experiences for more than 12 years. She is currently a human factors psychologist and senior consultant at IBM. She can be reached at polkosky@comcast.com.

The Guitar and the Case of Opportunity Lost

Eltropy Expands Voice Authentication Ecosystem with Illuma, IDgo, and Pindrop

Modulate Expands Velma with Voice-Native Real-Time Conversation Intelligence

Corti Launches Symphony for Speech-to-Text

Why Voice AI’s Next Big Challenge Isn’t Accuracy. It’s Relationship Design.