Mobile Speech Promises Glittering Prizes But Also Serious Challenges
NEW YORK (SpeechTEK 2011) — “It used to be that we had a mobile business unit, but that’s losing meaning. All of our business is becoming mobile,” said Vlad Sejnoha, senior vice president and chief technology officer of Nuance Communications, speaking Tuesday at SpeechTEK 2011.
Sejnoha’s comment marked one of the most revealing moments of the morning’s keynote panel, “Mobility—A Game Changer for Speech?” and one that was echoed by many of the panelists, a group of high-level technologists and business strategists from major speech concerns like AT&T and West Interactive.
For his part, T.J. Leonard, director of product marketing and customer business at Vlingo, believes that mobile speech is on the cusp of a new maturity. Speech was once an afterthought tacked onto phones with other features that might never be used by many cellphone owners. Now, smartphones and many of their inherent challenges have provided a more full-throated value for speech on the cellphone.
“The interesting thing about speech and mobile has been voice search,” Leonard explains. “Voice was trying to replace something that worked relatively well, like typing with your thumbs. We’re kind of on the cusp of a whole new set of problems that have been introduced with mobility.”
Sejnoha pointed to “app creep,” the accumulation of dozens of smartphone applications on one phone. Given the size of most smartphones, it becomes tedious to navigate through as few as 20 applications, and the problem worsens as the apps mount. But Sejnoha believes this is a problem that speech is particularly adept at handling.
“Imagine you have 100 icons from different providers,” he says. “That’s unmanageable. What speech has is the ability to overlay a visual environment and pull things without having to be seen.
“Speech can be built to complement and enhance other existing modalities,” he adds. “It’s a non-linear overlay. We’re really entering a renaissance era [for speech].”
For all the opportunities that mobile creates, it also poses challenges for speech. Accuracy will always be an issue, says Mazin Gilbert, the executive director of research at AT&T. “Twenty years from now, we’ll still be talking about that,” he says.
But mobile in particular is problematic because people seek to have their voices understood in increasingly noisy places. Telephonic speech recognition technology was originally designed with local-area network (LAN) lines in mind. Speech engineers imagined their users at home or in offices—both relatively quiet places. Now users are calling from the road, streets, bars, malls, and ballgames. There are technological solutions, Gilbert notes, but they carry water only so far.
Also proving difficult is getting users to trust speech technology. Years of less-than-ideal voice recognition in automated interactive voice response systems have eroded confidence.
“You hear again and again that ‘I’ve tried that before. It doesn’t work with me,’” says Leonard. “It’s harder because it’s not a new technology...You can’t promise a big change and fall short providing only incremental change. We’re in a dangerous place because of preconceived notions.”
Another challenge is that the industry hasn’t reached any consensus on how to make money in the mobile field. “Some apps are subscription-based,” Gilbert explains. “There are a lot of apps out there that are ad-based. The economics are something we can’t avoid. Some of us come from companies where we’re forced to think of that day in and day out. We’re working on it, but I’ll admit that we haven’t figured that out yet.”
"One thing is clear, however: Whoever manages to crack the mobile speech nut first stands to make a lot of money."