IBM Puts Speech on Display

It's been 10 years since IBM first introduced a dictation product into the voice technology market, and that has served as a stepping stone to change the way people interact across so many modalities, IBM executives said earlier this week at a special IBM Speech Technology Innovation Day in New York.

Speech technology has finally arrived "as a staple of everyday life and is only going to become more pervasive," said David Nahamoo, Big Blue's chief technology officer for speech technology research. He noted that IBM speech software can now help people control their car radios, transform the way people receive healthcare, help children learn to read, make driving safer, help companies gain new business insights and improve customer service, and allow people to communicate in foreign countries.

Technology limitations, like background noise and an inability to decipher dialects had once served as a roadblock, but have since been overcome, he said, and IBM is now sharing those capabilities with clients and partners. "Through collaborations with our partners, speech technology has become ubiquitous and is transforming the way we work and access information, by making it available, literally, for the asking," he said.

The Speech Technology Innovation Day served as an opportunity for IBM to showcase some of those partnerships and collaborative efforts. Among them, the company announced deals with Pioneer Electronics, Alpine Electronics, and Johnson Controls, all of which will all be including IBM's Embedded ViaVoice technology to speech-enable their automotive navigation and entertainment systems.

Pioneer's AVIC-Z2 system contains advanced voice recognition and voice guidance technologies that allow drivers to use voice commands to control the audio, video, and navigation systems in their cars. Users with the optional ND-BT1 and a Bluetooth-enabled cell phone can wirelessly dial their phones and speak hands-free through a built-in microphone and the vehicle's speakers. A 10-gigabyte partition on the hard drive stores music, which can be accessed or searched by saying an artist name, song title, album title or genre. An advanced text-to-speech engine allows the navigation system to give spoken turn-by-turn directions; drivers can request the directions by verbally providing city name, street name, house number, or points of interest.

"At Pioneer, we build toys for cars, and IBM and ViaVoice are making our toys more useful," says Ted Cardenas, navigation systems product planning manager at Pioneer.

"Drivers are very happy to use the TTS functions because they no longer have to take their eyes off the road."

Johnson Controls' Mobile Device Gateway enables drivers to not only voice control their car stereos, cell phones and navigation systems, but USB ports allow drivers to link other devices, like IPods or portable memory sticks, into the system and voice control them as well. Some systems will also link to the car's climate controls so that drivers also can dictate temperature settings. The system responds to commands in English, Spanish, French and a number of other European and Asian languages and is compatible with 90 percent of the Bluetooth phones on the market today, says Mark Zeinstra, infotainment product director at Johnson Controls.

Alpine's system, the NVE-N872A DVD-based, satellite-linked system offers turn-by-turn directions for locations in the United States and Canada in English, Spanish and French, and lets drivers interact with the system through voice, touch screen or remote control. The system also functions as a personal concierge and includes a database with restaurant guides and ratings.

IBM and Avoca Semiconductor have also teamed up to voice-enable music searches provided by All-Media Guide (AMG), a music, video, and games content provider. IBM's ViaVoice speech recognition software will allow users to locate music on their personal digital entertainment devices through voice commands, and allow them to select their music by artist, title, song, or genre. A unique feature also allows for searches based on other common identifiers, such as "The Boss" for Bruce Springsteen or CSNY for Crosby, Stills, Nash, and Young. 
Using IBM's ViaVoice, people will be able to access AMG's databases "in a very intuitive and comfortable way," said Zach Johnson, product manager at AMG.

Another IBM collaboration, this time with Fluency Voice, is being implemented to improve call center operations at Capital Blue Cross (CBC), a health insurance provider in western Pennsylvania. When the system goes live in April, CBC hopes to remove 30 percent to 40 percent of the basic healthcare coverage eligibility questions that now come into the two call centers it operates.

CBC currently receives about 21,000 calls a month from doctors and patients to determine pre-authorizations for medical treatments and each call can take roughly four minutes or more to complete. "Due to the volume of calls they receive, only about 66 percent of the calls actually get through," said Leah Eyler, a consultant with Fluency Voice.

At the early stages of the implementation, the call center application will be used to gather patient identification, verification, and status information and to route calls, but future considerations include authorizations, case management, and physician location services, among others.

Other voice technologies that IBM showcased at the event included Where Is It, a GPS-based service that allows users to search for the nearest restaurant, hospital, movie theater, etc., using simple voice queries; WebSphere prepackaged, speech-enabled call center applications; a unified messaging solution called VoiceRite Client; MenuMaker Web-based auto attendant; Voice Dialer and Voice-Enabled Password Reset; TALES, a real-time monitoring and speech-to-text closed captioning service for foreign language broadcasts; the Multilingual Automatic Speech-to-Speech Translator (MASTOR), which is now being piloted among U.S. troops stationed in the Middle East; Contact Center Agent Buddies speech analytics that can help agents with suggested answers to customer questions; the call center Help Desk; its latest version of Lotus Sametime, its popular office software that includes enhanced messaging and conferencing features; Reading Companion, a Web-based application that uses speech recognition to improve literacy; Caption Me Now, a dictation and multimedia presentation tool; Expressive Speech, a speech synthesis program; the Juke Box in-car voice recognition dialog system; and other technologies.

"The majority of the speech technology in use today is in the contact center," Nahamoo admits, and IBM is working heavily to improve those operations with things like enhanced call routing, transaction handling, automated technical support, self-service, trends analysis, he said.

"I've been in the speech industry for 25 years, and it's come a long way," he said.

Among the areas of focus for IBM in the future will be conversational interaction, transcription and speech-to-speech translation. This last item "will have a far richer impact on society in the future," Nahamoo said. "I would put it on the same level of transportation and telephony in breaking down barriers. Transportation took distance away; the telephone took time and distance away, and translation will take cultural and language differences away."




SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues