-->

Upwardly Mobile: Speech Answers the Wireless Call

Article Featured Image

Speech in mobile phones is nothing new. But following a bevy of announcements at April’s CTIA conference in Las Vegas, the technology is poised for a much more thorough integration into mobile devices. In addition to heavyweights like Nuance Communications, smaller start-ups like SpinVox, YouMail, and SR Virtual also pitched new products and services before and during the conference.

Though voice controls remain a popular feature in directory assistance, GPS, and basic control functions, the push for growth in speech-to-text  for voicemail has generated real user interest. Investors also took heed and answered with their checkbooks. British visual voicemail provider SpinVox, for example, received more than $100 million in funding from investors, including Goldman Sachs. Though it provides its services through 12 mobile carriers, the company says it plans to double that figure in coming years and invest heavily in a new speech recognition research center.

"We’re going from being a development company where we take the best technologies and techniques available," says Jonathan Simnett, SpinVox’s director of global communications. "Now that system is so advanced we’ll be a research and development company. We’re absolutely pushing the edge with what we’re doing now."

In an increasingly competitive landscape, new companies are also being forced to offer additional services or features to differentiate themselves. Custom voicemail provider YouMail announced it would include visual voicemail in its cadre of services. The twist? During rollout of the STT service, YouMail, which is primarily user-driven and functions as an interactive online community, allowed customers to rate and make corrections to voicemail transcriptions.

YouMail’s visual voicemail, which is currently a free service running in beta, treated user input as a means of differentiation, according to Alex Quilici, YouMail’s CEO.

"You can rate [the transcription] like you would a YouTube video," Quilici states. "If you rated it poorly, you get an opportunity to edit it, and that feedback is sent back to our engine. Over time, if we notice the same mistakes being made, we can fix them."

Perhaps the most recognizable company to enter the visual voicemail fracas is Nuance, which used CTIA to announce the aptly named Voicemail to Text. This partially automated service leverages the company’s Dragon NaturallySpeaking engine and 3,000 transcriptionists.

Transcription services are not new to Nuance, which has maintained both Dragon NaturallySpeaking and a hidden-operator transcription service for more than a decade. In fact, Dragon NaturallySpeaking is one of the keys that distinguishes Nuance’s mobile STT services from competitors like SpinVox, according to Mike Thompson, general manager and vice president of Nuance’s Mobile Division. "As I understand it, most of our competitors don’t use speech recognition at all or are just entering the space, largely from pressure from us," he says. "To make this a fully scalable, mainstream service, automation plays a critical role."

Datamonitor senior analyst Daniel Hong agrees that Dragon NaturallySpeaking gives Nuance a significant edge. "Because they have Dragon, which has been around for a long time, the grammar and vocabulary will be larger," he says.

New to Mobile
It’s notable, however, that despite Nuance’s experience in creating STT-centric applications, it is only now adapting the technology for mobile voicemail applications.

For Jim Larson, an independent consultant and VoiceXML trainer, speech applications in mobile handsets are long overdue—not just from Nuance, but from all vendors. "I don’t really understand what’s holding vendors back from having applications on the phone with their mobile devices that use speech input, speech recognition," he says. "I think it might be that users have used speech recognition before and found that it didn’t work very well and are afraid to try it this time around."

Thompson points to greater carrier demand because of a push from the mobile phone user base for unified communications. Voicemail as a stand-alone dial-in product, he says, is out of date, especially in a technological landscape where users already receive SMS and email on their mobile devices. He cites Nuance-conducted surveys that revealed nine out of 10 consumers would like to see voicemail delivered as text.

"With Nuance, we like to evaluate the business and possibilities potential of a given market," he says. "So we took a slower approach while we evaluated the potential of the business. We think the start-ups have done a good job to create some energy in the space. We know the carriers are very interested. There’s a great deal of demand. We’re applying what is telco-grade, proven technology, scalable technology, to attack this market and serve this market in a big way."

Companies like Nuance, SpinVox, and SimulScribe are some of the larger companies attacking visual voicemail. However, other vendors, like Los Angeles-based start-up SR Virtual, are also setting up camp and asserting themselves with a varied range of offerings. In September, SR Virtual will launch the beta version of SendChat, a downloadable application that allows users to dictate text messages or receive voicemail as text. The service also hosts VoIP calls and will add GPS and 411 services, according to owner Ray Galan.

And because the application runs from the Internet, dictation and transcription services are accessible from any phone or network. Galan believes this versatility will attract the smart phone and pocket PC market as well as the rest of the mobile market.

"We feel that our market is open, and there is definitely room to dominate and attain exposure," Galan says, "especially since our software is uniquely different."


The Weigh In
What sort of applications in mobile handsets will propel speech into mainstream adoption?

"Those supporting sophisticated information search and retrieval, designed for entertainment, or targeted toward learning and training.  The latter represents a vast, unexploited area where the mobile handset can facilitate learning languages, augmenting elementary or secondary education courses, or even the taking of tests."
– BILL SCHOLZ, PRESIDENT, AVIOS

"As more municipalities and governments forbid the use of mobile voice (and data) while driving, presence- and location-based services will drive voice recognition technology into
heavy use."
– STEVE HILTON, VICE PRESIDENT OF ENTERPRISE and SMB RESEARCH, YANKEE GROUP

"Voice search that offers a simple, unifying paradigm: just say what you want and get it, using a single search box."  
– BILL MEISEL, PRESIDENT, TMA ASSOCIATES

"Mobile devices will enable the wide adoption of applications involving speech from wherever the user is, not just from the user’s workstation. These include productivity, communications, informational, educational, and entertainment."  
– JIM LARSON, CONSULTANT AND VoiceXML TRAINER


Control Freaks
Visual voicemail isn’t the only application that has seen increased uptake in mobile handsets. Speech controls remain a point of excitement for many in the industry, especially given the increasing multimodality of many handsets. Operating a phone by voice is particularly useful in hands-busy, eyes-busy environments like driving.

Though adoption has been gradual, uptake is definitely increasing. Nuance’s VSuite, which outfits a handset with voice commands for name or digit dialing, contact lookup, and application launches, has already shipped in more than 200 million phones and will be embedded in AT&T’s Palm Centro thanks to a deal signed in early April.

"At present, speech recognition is a critical application on a phone," says Mike Thompson, Nuance Mobile’s general manager and vice president. He concedes, however, that speech controls still need to prove their worth to the public. "A fully integrated interface that optimizes the best of voice recognition, predictive text, and GUI interfaces is where this market is headed," he continues. "There are numerous use cases where any one of them might be the optimum interface for that particular moment." —-Ryan Joe


SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues