Competition Heats Up
Certain instances make inputting information downright difficult: Maybe your hands are not free, or you are working with a small device, like a cell phone. In such cases, a voice recognition system would make inputting data easier. A person could look up information about local businesses, sports scores, driving directions, flight itineraries, stock quotes, or updated weather reports, and the response could be sent as a digital image, text message, or even spoken back to the user. Ideally, such a system would adapt to the unique aspects of each user’s voice over time through a built-in feedback system.
In addition to the benefits offered to consumers, companies could use voice input to make their e-commerce sites more appealing, and call center agents could rely on it to find needed information faster.
That is the promise of voice search, which has garnered interest from the computer industry’s most influential companies, namely Google, Microsoft, Apple, and Yahoo!. In addition, speech recognition suppliers, such as Novauris, Nuance Communications, and Vlingo, have been busily building new voice search products. Cellular carriers, such as AT&T and Verizon, have been dabbling with the technology, and even car companies, such as Kia and Ford, have been integrating these features into their vehicles (see “Voice Search Takes to the Highway"). “Eventually voice will become more than just a way to perform functions, such as search; it will become a general-purpose interface to information stored on a number of different devices,” predicts Mike Phillips, co-founder and chief technology officer at Vlingo.
While there has been a great deal of interest in this area, the technology is still in a nascent state of development. The number of implementations has been small, with most systems in V1 iterations for a few select applications. Voice search systems face challenges, such as improving their accuracy rates and providing more satisfying user experiences. The work needed to deliver functional voice search systems is complex and costly, and the various parties have found it challenging to determine how to monetize their investments. None of these hurdles are insurmountable, but right now no one is sure how—let alone when—they will be resolved.
Much of the early focus in the voice search arena has been centered on mobile phones, primarily because the 4 billion handsets in use today represent an alluring target. Convenience—or, rather, inconvenience—is another driver. “The keyboards available with cell phones are not easy to use,” notes Mike Thompson, senior vice president and general manager at Nuance Mobile. Voice input could make input less arduous.
When it comes to search, Google has the most significant mindshare: It’s already the top Internet search provider, and no doubt it wants to become a top voice search supplier as well. “Google has and will continue to expend significant money and resources to make sure its voice search features work,” predicts Stephen Arnold, managing partner at ArnoldIT.com.
Google co-founders Larry Page and Sergey Brin have filed a number of voice search patents. Google’s Mobile App, which runs on its own Android mobile phone operating platform, as well as BlackBerry and iPhone devices, relies on proprietary voice recognition technology to convert spoken questions into Web searches. With Google’s voice application, a user can bring the phone to his ear to activate the voice prompt.
Google’s initial focus has been on local search. If a consumer needs to locate the nearest coffee shop, then he would ask, Where is the closest Starbucks? and Google Search would produce results. A second thrust is integrating the voice search functions with Google Maps so the search function could identify pizza parlors in San Francisco, and then the mapping function could identify the caller’s location and provide turn-by-turn directions to get to the nearest one. Such capabilities are in an early stage of development and available on Android and select BlackBerry mobile devices. The mapping service now works only with English, but Google plans to add more languages.
Google, however, is facing stiff competition in the mobile phone arena. Apple reportedly is in talks with Microsoft to replace Google as the default search engine on its iPhone, signifying growing tension between the two firms. Google also in early January introduced the Nexus One smartphone, a voice-enabled mobile device that runs on the Android platform; the device was supposed to be Google’s answer to the ubiquitous iPhone (See related story in this month's FYI section). Some of the voice-enabled features on the new Nexus One phone, which was manufactured for Google by HTC, include:
- a voice-enabled keyboard for all text fields;
- the ability to tell your phone what you want it to do;
- the ability to search Google, call contacts, or get driving directions by just speaking into your phone;
- the ability to get transcribed voicemail with Google Voice integration; and
- text-to-speech output for location-based and mobile search applications.
Microsoft Weighs In
Another industry behemoth, Microsoft, moved into the voice search market in March 2007 with its acquisition of Tellme Networks. Tellme’s initial focus was on using voice search to automate call center processing. Companies such as American Airlines, Domino’s Pizza, E*Trade, and Merrill Lynch use the system for customer service applications. Carriers were another area of emphasis for Tellme: The company developed business search on 411 calls for network operators. At the time of Microsoft’s purchase, Tellme had raised more than $238 million in venture capital, and was generating more than $100 million in revenue.
Since the acquisition, Microsoft has integrated Tellme’s underlying technology into a number of applications. With its Live Search 411 service, customers can use toll-free lines from any phone to find and/or connect to a local business or to hear local information, such as weather updates, movie show times, or airline information. Cell phone users can also receive links to traffic maps from Live Search 411. In addition, the service uses global positioning system data to provide location-aware searches for customers. Initially Microsoft centered on its Windows Mobile platform, but recently opened up its services to BlackBerry devices. In December, the vendor added the Bing for Mobile application, which supports Windows-based smartphones.
Not to be left out in the cold, Yahoo!’s voice search product is oneSearch, which is available on 80 mobile device types, including Apple’s iPhone. The search company, whose voice search offering is powered by Vlingo, says that while most mobile voice recognition systems are specific to vertical categories, such as local listings, oneSearch delivers results for many kinds of inquiries.
Partners are important in this emerging field. Yahoo! is trying to woo developers with its Blueprint platform, which enables content creators to write apps once and have them run on multiple platforms. In addition, Yahoo! signed a pact with AT&T to provide search for the BlackBerry, Nokia, Windows Mobile, and iPhone devices it supports.
In addition to the consumer search market leaders, many voice recognition vendors have been trying to capture the growing interest in voice search by developing products for that market. Nuance has a couple of initiatives. In a one-two punch for iPhone voice recognition applications, the vendor announced Dragon Search, a speech-to-search application, and Dragon Dictation, a voice dictation tool. Dragon Search translates voice queries into text and supports Web searches from a variety of search engines, including Microsoft’s Bing, Google, and Yahoo!, and popular Web sites, such as iTunes, Twitter, Wikipedia, and YouTube.
Nuance also offers the Nuance Voice Control system, a customizable, modular framework that allows operators and handset manufacturers to speech-enable applications and mobile devices. Possible applications include voice-activated dialing, free-form Web search, text or email message addressing and dictation, music search, navigation, games, and social media.
Vlingo is another supplier focused on voice search development. Its system offers users the ability to send text and email messages, call contacts, search the Web, and update their Facebook or Twitter statuses by speaking into their phones. Vlingo is available for Apple’s iPhone, certain BlackBerry smartphones, Nokia S60 phones, and select Windows Mobile phones.
Finally, Novauris Technologies developed NovaSearch Compact, an extension of its large-scale voice search technology designed for handheld and embedded devices, such as smartphones and navigation products. The initial focus has been on location-based applications, such as finding an address for a restaurant or corporate office.
While the voice search market has garnered a lot of activity, it faces both technical and business obstacles. “We have to make sure that the user interface is quite simple even though the applications behind it are very complicated,” notes Yoon Kim, CEO at Novauris.
The most significant hurdle has been designing voice recognition systems that can fit into very small form factors. Servers have much more processing power and memory than cell phones; a typical U.S. address search needs around 10 megabytes of active memory and processor clock speeds in excess of 400 megahertz to provide feedback in a timely manner, according to Kim. The vendors of these technologies have tried to use the Internet to solve this problem. In most cases, they embed a small set of functionality onto the user device and connect it to more sophisticated back-end processors that try to figure out what the user is saying—which is not easy.
“Voice search system accuracy is in the 65 percent to 70 percent range now,” ArnoldIT.com’s Arnold says. Rates are much lower with individuals who speak with accents or local dialects. In these cases, accuracy often falls below the 50 percent range, so it becomes hit or miss whether the system will correctly recognize a voice input.
Lower recognition numbers also stem from the way users input information. Speaking into a cell phone is not as simple as speaking into a landline phone: Users might not know where the microphones are, for example, or they might hold the devices too close or too far away from their mouths. They might shout into the systems or pause for a period of time as they figure out what to say. Their input could also be encumbered by background noise.
No Common Foundation
Overcoming recognition issues can be difficult for a number of reasons. “There are no common design points with mobile systems,” Vlingo’s Phillips says, noting that each time a vendor develops a way to ensure speech is recognized on one handheld, it is uncertain whether those gains can be replicated on other systems.
In addition, the actual user input is much more scattershot than typical customer service scenarios. “A customer service voice recognition system may have a 10,000-word vocabulary while a local business search system could have 25 million listings,” explains Mazim Gilbert, an AT&T Lab researcher. Consequently, companies need to have a wider range of possible input options, and build vocabularies to support all of them.
Quite often with voice searches, the user—not the system—is the problem. A user might not know the correct name of the business she is trying to find. Even though the system correctly identifies what the user has said, it does not deliver the desired information, leaving the customer to think the voice recognition failed.
Vendors have been searching for different ways to overcome such problems. One is presenting users with a variety of options from which to choose. Similar to a Web search, a voice search could return 10 possible listings. “Accuracy has been hard for us to gauge,” admits Jay Wilpon, a researcher at AT&T. “In some cases, users seem to get the information desired—they click on a button or take another action—but do not come back. In other cases, they abandon the search, come back another time, and eventually become our power users.”
A Money Maker?
Even though vendors are pumping a lot of time, money, personnel, and effort into figuring out how to make sure their users have rewarding experiences, no one seems to know how they will recoup their investments. “The industry seems to be content to move ahead with development now and figure out how to make money later as the market evolves,” AT&T’s Gilbert notes.
No one seems to think users will be willing to pay extra for voice search alone, so a variety of other business models have emerged. Google continues to focus on using search to expand its advertising-based revenue model, and Yahoo! is following a similar path. Apple is also exploring ways to manage ads displayed on the iPhone. Microsoft is doing that as well, but is also focusing on its traditional software licensing sales. Nuance and Novauris are selling their products to device manufacturers, carriers, and large companies that have incorporated their technologies into various services. Meanwhile, Vlingo is concentrating on the consumer voice search market. It has one version of its system that users can download for free, and a second that costs about $20 and translates items, such as email messages, into voice messages.
Carriers have followed three different paths to generate revenue from voice search. First, they are using it to improve sales of their own products. Novauris’ NovaSearch technology has been woven into Verizon Wireless/Medio Systems’ Get It Now Search, giving subscribers voice access to the Verizon digital media catalog. Other carriers are offering voice search as part of their premium services. For instance, AT&T integrated the technology into its Navigator service, which costs $9.95 a month and provides users with real-time driving directions. Still other carriers charge third-party application developers to incorporate voice search into their mobile applications.
How quickly voice search will take root is unclear. Some progress has already been made. Nuance says 100 million phones have been outfitted with its voice search features. Vlingo, which was founded in 2007, says its software has been downloaded more than 3 million times, and that more than 100 million voice searches were performed by its customers in 2009 alone. However, those lofty numbers do not necessarily mean that many voice search applications are available or many individuals find them helpful.
For the moment, the voice search industry faces a great deal of uncertainty. However, the fact that Google, which has become the IT industry’s bellwether, and other well-established companies are putting so much emphasis on it seems to indicate that mobile voice recognition systems eventually could usurp Web browsers as users’ most popular interface to the Internet.
Sidebar: Voice Search Takes to the Highway
Mobile phones and automobiles have been like oil and water.
“A number of states have passed or are considering laws that make it illegal for individuals to use their cell phones and drive,” says Yoon Kim, CEO at Novauris. These states want drivers’ hands on the wheel, not on their handsets. A voice input option has the potential to solve this problem and, therefore, has piqued automobile manufacturers’ interest.
Ford has been at the head of the pack, working with Microsoft and Nuance Communications to deliver Ford SYNC, a voice-activated, in-car communications system, in 2007. The system started out helping drivers find their favorite songs on their iPods or satellite radio systems, and then gradually gained additional capabilities, such as roadside assistance and concierge services. At the end of 2009, the company announced a version that features an in-car wi-fi system powered by a USB mobile broadband modem, effectively turning the car into an Internet hot spot (see related story in this month's FYI section).
Consumers seem interested in such capabilities. Ford vehicles with SYNC sold twice as fast as models without the system, according to Brigitte Richardson, speech systems lead engineer at Ford. In fact, 80 percent of the vehicles Ford sells in North America are equipped with SYNC, and more than 1 million SYNC-equipped vehicles are now on the road.
In response, other car companies are adding similar features to their automobiles. Kia Motors, which plans to start rolling out voice search features on its vehicles in the summer, worked with Microsoft on a SYNC competitor, dubbed UVO. This system enables users to answer and place phone calls, receive and respond to text messages, access music from a variety of media sources, and create custom music experiences.
The number of vehicle voice response cars is expected to increase dramatically in the next few years. Somewhere, the creators of the 1980s TV show Knight Rider are smiling.