Mobile Speech Promises Glittering Prizes and Serious Challenges
NEW YORK (SpeechTEK 2011) — “It used to be that we had a mobile business unit, but that’s losing meaning. All of our business is becoming mobile,” said Vlad Sejnoha, senior vice president and chief technology officer of Nuance Communications, speaking at SpeechTEK 2011 in August.
Sejnoha’s comment marked one of the most revealing moments of the morning’s keynote panel, “Mobility—A Game Changer for Speech?” and one that was echoed by many of the panelists, a group of high-level technologists and business strategists from such major speech companies as AT&T and West Interactive.
TJ Leonard, director of product marketing and customer business at Vlingo, believes mobile speech is on the cusp of a new maturity. Speech was once an afterthought tacked onto phones with other features that might never be used by many cell phone owners. Now, smartphones and many of their inherent challenges have provided a more full-throated value for speech on mobile devices.
“The interesting thing about speech and mobile has been voice search,” Leonard explained. “Voice was trying to replace something that worked relatively well, like typing with your thumbs. We’re kind of on the cusp of a whole new set of problems that have been introduced with mobility.”
Sejnoha pointed to “app creep,” the accumulation of dozens of smartphone applications on one phone. Given the size of most smartphones, it becomes tedious to navigate through as few as 20 applications, and the problem worsens as the apps mount. But Sejnoha believes this is a problem that speech is particularly adept at handling.
“Imagine you have 100 icons from different providers,” he said. “That’s unmanageable. What speech has is the ability to overlay a visual environment and pull things without having to be seen.
“Speech can be built to complement and enhance other existing modalities,” he added. “It’s a non-linear overlay. We’re really entering a renaissance era [for speech].”
For all the opportunities that mobile creates, it also poses challenges for speech. Accuracy will always be an issue, said Mazin Gilbert, executive director of research at AT&T. “Twenty years from now, we’ll still be talking about that.”
But mobile in particular is problematic because people want their voices to be understood in noisy places. Telephonic speech recognition technology was designed with local-area network lines in mind. Speech engineers imagined their users at home or in offices—both relatively quiet places. Now users are calling from the road, streets, bars, airports, malls, and ballgames. There are technological solutions, Gilbert noted, but they carry water only so far.
Also proving difficult is getting users to trust speech technology. Years of less-than-ideal voice recognition in automated interactive voice response systems have eroded confidence.
“You hear again and again that ‘I’ve tried that before. It doesn’t work with me,’” Leonard said. “It’s harder because it’s not a new technology.... You can’t promise a big change and fall short providing only incremental change. We’re in a dangerous place because of preconceived notions.”
Another challenge is that the industry hasn’t reached a consensus on how to make money in the mobile field. “Some apps are subscription-based,” Gilbert said. “There are a lot of apps out there that are ad-based. The economics are something we can’t avoid. Some of us come from companies where we’re forced to think of that day in and day out. We’re working on it, but I’ll admit that we haven’t figured that out yet.
“One thing is clear, however: Whoever manages to crack the mobile speech nut first stands to make a lot of money,” he added.
Applications are out there—if you know where to look.