Voice Searching for an Interface

SAN DIEGO -- Today’s opening panel at Voice Search 2008, moderated by TMA Associates president Bill Meisel, gave occasionally conflicting views on the current business and technological landscape for voice search in directory assistance and mobile applications.

"People are pretty well aware that the nature of what you search for on your mobile is different than what you search for on your desktop," says Michael Cohen, manager of Google’s Speech Technology Group. "On your mobile, you deal with immediate needs, location-based, things that need immediate satisfaction." Michael Wehrs, Nuance’s vice president of evangelism and industry affairs, cited a company study showing people were most likely to use voice when searching on the go and that food was often the top query.

But voice search, Wehrs emphasizes, is only "a way to get something done. People may choose the modality of voice to get information. The relevance of it, the flexibility of it, become interesting components in how you measure how good it is."

Panelists agreed there was a tendency in the industry to regard voice as an application when it should be perceived as an interface. Yet, panelists disagreed on how users will adapt that interface. "Directory assistance is the way in, where the masses will integrate," Wehrs says. "We see it in our product offering."

Victor Melfi, VoiceBox Technology’s chief strategy officer, disagrees. Directory assistance services, like 411, suffer from "the problem of deeply nested data." He cites an incident in which it took him two minutes to get a single stock quote on an enhanced 411 service.

Melfi, however, is in the minority. Neal Bernstein, Microsoft’s senior director of local and mobile search, argued that such a problem has always been an issue with IVRs in general. "From the Microsoft point of view, we’re going to continue to…take more and more advantage of personalization features," Bernstein says. "So [voice search] can hopefully figure out what’s the user’s intent to eliminate the frustration of nested menus." This might mean incorporating the screen on a mobile phone and introducing a GUI in conjunction with the voice interface, which the developers are still trying to reconcile. "We're at the baby steps of integrating multimodality," Wehrs says.

Cohen emphasizes that voice isn't optimal for all scenarios. "We mustn't become too voice-centric," he advises.

Yet all these ideas on how best to incorporate a voice search interface overlook a rather overwhelming issue: Speech recognition itself might not be good enough.

While Cohen agrees with Melfi that a workable interface is "the single biggest bottleneck," he has additional issues with recognition accuracy. "We’ve had incremental progress for years, but it’d be nice to see some big breakthroughs," he says.

But John Tadlock, AT&T’s lead technical architect on consumer application architecture, alternatively wonders if the current business landscape inhibits this technological advancement. "What service has a mass-market appeal and a business model that works?" he wonders. "If you could answer that question, then you’ll get all the technology people making sure that [the technology] works."

Given the amount of money in telecommunications, it’s difficult to convince providers, who hold the purse strings, to allow developers to experiment too drastically with user interface.

"I’ll agree that the number one problem that needs to be resolved is the business problem that lets us make enough money to continue this," Wehrs says. "It’s hard to get innovation when you’ve got so many controls placed on you by others."

Entire transcript posted at SpeechTechBlog.com

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Voice Searching for an Interface

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Soul App Launches Full-Duplex Voice Model

Mistral Unveils Voxtral Open-Source AI Voice Model

Leena AI Launches Agentic AI Colleagues