Using Speech for Search
Customers of Pull-A-Part, an Atlanta, Ga.-based owner of five discount, self-service auto parts stores/salvage yards in the southeastern U.S., rely on a speech-based search system to find the parts that they need.
A customer-searchable inventory is critical for the company in order to keep employee overhead low, according to Ross Kogon, chief operating officer. The company set up a searchable Web site a couple of years ago, which enabled customers to use search capabilities at its locations. In order to serve a wider variety of customers, the company needed to add voice searching capabilities as well.
"Some people don't have an Internet connection," Kogon explained. "Others may not be near a terminal when they want to contact us. Someone may be working on a car when he finds he needs a part. He can just pick up the phone and call us."
The speech capabilities expand the search features available on Pull-A-Part's Web page to customers who lack Internet access. Following the recorded prompts, customers navigate a menu to find the makes and models of cars that are currently in stock at specific store locations. After requesting make and model, the customer then asks for the specific part. If the needed part is not in stock, the customer can request a call back when the part becomes available. The system monitors a location's inventory and notifies the customer by phone or fax when a car of the correct make and model arrives.
With hundreds of makes and models of cars, it would have been, at best, extremely cumbersome to use a traditional IVR system for inventory searches, added Andrew McCann client manager for Definition 6, an IT services firm that installed the Microsoft Speech Server-based technology and interface with Pull-A-Part's proprietary inventory system.
"You'd push one for Audi, two for Chevy, etc., then another number for the model. There would be too many choices," McCann explained.
"This enables the customer to search through our entire inventory system the same way that they can on the Web or in the stores," Kogon said. "It's a more fluid and friendly architecture than a generic IVR system. Centralized IVRs is where everything is going. This puts us ahead of the curve."
The system recognizes each of the four distinctive regional dialects that residents speak in the four southeastern states where the company has stores, Kogon added. Users can search one store or multiple stores. If looking for a common part (i.e., a starter that would work with a handful a General Motors cars), the customer might go to either of the Atlanta stores, but is unlikely to travel to Pull-A-Part's Nashville, Tenn. facility to obtain it. If, on the other hand, it's a part for a classic car, the customer may make the long trip. Customers can also conduct multiple car searches.
The Pull-A-Part example is only one way that speech is being used for search today, a capability that some expect to expand in the future. "We're starting to push speech for this type (inventory) of application," explained McCann.
In addition to Pull-A-Part, speech is being used to search in medical facilities like Miami Children's Hospital (MCH) to aid hospital staff in locating critical information when they don't have computer access (see August/September 2005 issue of Speech Technology Magazine) and other industrial environments to make searching easy when Internet-based searching is impossible or impractical.
Igor Jablokov, program director for multimodal and voice portals for IBM's software group, calls speech search "the killer application for handset devices." The company's WebSphere technology is the basis of MCH's speech capabilities.
According to Jablokov, an increasing portion of cell phones, BlackBerries and other handheld devices have data capabilities that are underutilized or never used at all because typing on the devices with the small buttons/dial pads is too cumbersome.
Definition 6's McCann and others also see the ability to use speech in conjunction with mobile devices as one of the future drivers of speech for search.
"Companies have invested in vertical applications and billions of dollars have been spent on broadband networks, but people with small devices never use the data," Jablokov said. "Speech [search] is better, cheaper and faster and provides access to information in less time."
IBM worked with Opera Software to develop a multimodal desktop browser that incorporates the voice libraries from IBM's ViaVoice speech technology, enabling Opera users to navigate, request information and even fill in Web forms using speech and other forms of input in the same interaction.
"By using voice, the remote user can more easily search through emails on devices with small screens," Jablokov said. The speech search capability enables the user to request emails from only a specific party. Then only those emails will be displayed, rather than all of the user's e-mails.
Jablokov foresees further expansion of speech and search as companies speech-enable more browser-based applications. IBM enables browser application developers to download a comprehensive tools package, Multimodal Tools Plus, which uses the IBM Embedded ViaVoice speech engine and the X+V (xHTML and Voice) multimodal specification. The tools package is designed help developers write and test voice-enabled Web applications for search and other uses.
Searching for Online Audio Files
Google, Yahoo! and other popular Internet search engines can enable people to find audio and video files on the World Wide Web, but currently those searches are limited to those files that contain the search terms. This leaves a growing number of audio files unsearchable that could be available if the searcher knew how to find them.
According to Robert Weideman, senior vice president of marketing and product strategy for Nuance, very few content creators take the time to add text descriptions (metadata) to visual or the audio (or audio and video content) while it is being created, so many of these files can be missed by search engines using traditional search techniques.
Yet more news organizations, government and corporations want to enable the public to search audio and video files including newscasts, speeches and technical information. Audio mining using speech-to-text is seen as the most efficient way of doing this. Weideman explained, "Audio mining is all about making audio and video files more searchable."
Simply searching for avi audio files on Google will produce 20 million hits, Weidman said. Using audio mining makes these files more easy to search because the application recognizes where keyword matches occur in these files. The application marks each occurrence of key words or phrases in the audio file so that the searcher can find it and get the keyword or phrase in context.
Dragon AudioMining uses Nuance's speaker-independent recognition engine to create XML word, timestamp and metadata information for every word spoken within rich media files. XML speech index data makes the speech information within rich media files visible to text-based Web crawlers and search products, unlocking the information hidden within digital audio and video files on the Web and within private media archives. The underlying speech recognition technology that Dragon AudioMining employs is the same that is used in the Dragon NaturallySpeaking speech recognition solution. AudioMining extends the underlying statistical models that support this recognition engine to a server architecture.
According to Eric Negler, executive vice president, Coveo Solutions, Inc., a developer of integrated search solutions for the desktop, the intranet and the Web, companies who want to use "secured" (behind a firewall) search have a similar challenge in searching their files.
Coveo recently selected Nuance's Dragon AudioMining for its Coveo Enterprise Search for Microsoft SharePoint to enable its customers to search internal audio and video files using key words and phrases.
With the integrated ability to search audio and video content, all Microsoft SharePoint and Coveo Enterprise Search customers will be able to add rich media as a supported content type for search and retrieval.
The most immediate deployment of the audio and video search-enabled Coveo solution will be to the U.S. Army and U.S. Navy. Negler said, "The Navy wanted us to find a way to search its video files."
The Navy maintains much of its information on repair of equipment in video files in addition to manuals. According to Negler, by using speech-enabled search, a Naval engineer in the field can search for and retrieve not only any text instructions on a repair, but also how-to videos.
Other Internal Searches
Beyond browser-based searching, companies are using speech to search their own internal audio files, said Anna Convery, senior vice president of product marketing management for Nexidia, Atlanta, Ga.
"When I first joined the company in 2004, commercial enterprise search was not scalable for the enterprise; now we're seeing multimillion dollar deals," said Convery, who expects to see significant expansion of speech used for search in the second half of 2006. "The need has existed but the technology to conduct these searches has been too slow for adoption beyond what is absolutely required by regulation," Convery added.
"There are 53,000 contact centers that have millions of hours of audio recordings that they use for training and other purposes," Convery said. "They've spent too much on the hardware not to make audio searchable. Brokerage firms need to keep recordings to abide by compliance standards."
The compliance issue has come more into focus in the last couple of years as the Securities Exchange Commission has requested phone logs and voicemails from a number of different companies and has the ability to search for problematic words, phrases or other incriminating information in hundreds of hours of calls.
According to Convery, while the need has always been there for search capabilities, it's only been in the last 18 months that search engines have become fast enough to efficiently search these files.
Recent evolution of the phonetic search engines has made this process much less resource intensive with files searchable at much faster than real time, according to Convery. The actual speed of the search depends on the complexity of the search (the more terms, the slower the search) and the speed a company's storage system can stream the audio files to the search engine.
"The technology has come a long way," Convery said.
The growing popularity of podcasts has also led to increased demand for speech search capabilities. Outsourcers of podcasts are now among the biggest customers of speech search capabilities, explained Convery.
Recorded media can come from many sources such as security and surveillance, contact center cross industries, consumer interactions in cross industries, compliance initiatives in industries such as financial services, insurance and health care and market and business intelligence cross industries.
Nexidia's technology is based on phonetics. That means searched terms need not match letter-for-letter, but simply sound-for-sound. In this way, a search for 'stewart' or 'stuart' will bring back the same results with the Nexidia search engine.
The audio is indexed (made searchable) by the Nexidia media search and analysis solution at over 55 times real time on a single CPU. Dual CPUs double the processing time, three CPUs triple the processing times, etc. This audio is searched for key terms and phrases related to the business drivers of the organization. The results of these searches are written to a database and exposed to the users in a reporting view that enables the user to examine the results in order to get multi-dimensional views of the intelligence contained in their media files.
The user can drill down to the media files from these results and using the media replay client either listen to these results or conduct additional investigation through simple ad-hoc searches.
If broadcasters, call centers or other providers of audio archives use centralized preprocessing of media, they can distribute these preprocessed files to remote installations, where the affiliates or branches can conduct searches without additional preprocessing.
So while in its infancy now, speech will become a more critical element of search in the future, according to McCann.
"The majority of workers are looking more toward speech as a familiar interface," McCann explained. "It has easy usability for non-technical workers. It won't totally replace traditional searches with keyboard and mice. The uptake of Voice over Internet Protocol will also play into that." (for more on speech and VoIP, see the January-February 2006 issue of Speech Technology Magazine).