April 26, 2005
By Judith Markowitz Principal - J. Markowitz, Consultants
Forward Thinking

Speech in the Warehouse

ASR solutions for the warehouse have been available since the 1980s. Yet, when I attended ProMat, the premier warehousing conference, I found that only two of the 700-plus ProMat exhibitors specialize in automatic speech recognition (ASR) solutions: Vocollect and Voxware.

Vocollect
Vocollect has offered a wearable, ASR-based data-collection system called Talkman since 1987. Early generations of Talkman stored data in the device until the employee's shift ended and the data could be uploaded to the company's warehouse management system (WMS). The current model communicates directly with the WMS via radio frequency (RF) technology, ASR, and text-to-speech synthesis (in any of 14 languages) in a real-time, working dialog between the WMS and the employee. Device-based storage is used only when data must be collected in RF "dead spots." The data are uploaded as soon as the employee exits the dead spot. Talkman employs speaker-dependent ASR (the employee must train every word in the application) that's designed for high-noise environments. The standards-based software is designed to operate with virtually all current warehouse materials handling and logistics technology. On the business side, Vocollect has a global focus and partners with WMS providers, material handling integrators and specialty voice-solution providers.

Voxware
Voxware provides real-time, RF-based ASR solutions. In 1999, it acquired Verbex, a leading provider of ASR to factories and warehouses, and since then, focused on warehousing. Its system uses a belt-worn, RF transmission unit with a lightweight headset. Voxware employs standards-based software and holds a patent on the use of VoiceXML with a voice browser for wireless warehouse applications. ASR in English and U.S. Spanish employ generic word-models (called "language blueprints") that are tailored to the speaker's voice—like the adaptive models used in dictation systems. Training language blueprints to your voice takes 10 to 15 minutes. Recognition of all other languages is speaker-dependent and requires longer enrollments because each word must be trained. Prompts are recorded for all languages since the company does not employ text-to-speech synthesis. On the business side, Voxware's business model has always been to provide complete solutions to its customers, although the company is developing a partnership-oriented business strategy.

Speech in the Warehouse
ASR can be used for a variety of warehouse tasks (e.g., picking, put-away, replenishment, line-loading). The greatest penetration has been for picking, which involves assembling the items in a customer's order. Using ASR, picking becomes a dialogue between the worker (called a "picker") and the system. The system instructs the picker which items to pick, where they are in the warehouse's picking area, and how many are needed. The picker tells the system either the required items have been picked or there's a shortage that needs "replenishment." Other warehouse applications involve similar dialogues.

Using speech has many known benefits: Verbal confirmation of picked items and quantities reduces errors; voice data-input eliminates the repetitive movements (e.g., scanning) that cause repetitive stress injuries (RSI); and dialog-based interaction enhances speed and efficiency by eliminating the need to look at the screen of a handheld device. here are also warehouse environments, such as refrigerated warehouses in the food industry, where other data-input devices cannot be used.

Why, then, isn't ASR a dominant technology in warehousing? Paul Feorene, director of alliance operations at Manhattan Associates, (a Vocollect partner) identified several factors. "Until last year it [ASR] was very expensive. Prices have declined, handling of dialects is better, and the interface has become better, but it still is pricey for a lot of customers." He believes ASR looks very good when companies evaluate ROI in terms of general safety and reduction of RSI.

"For example, when you use an RF ID gun you have to take it off your belt, scan the item, and then return it to your belt. These actions are not only repetitive but they take time. A recent study by MIT found that the time it takes to do those operations can cost a company from five cents to 25 cents. Given the number of times a warehouse worker has to do it in a single shift, the cost adds up to quite a bit and builds a taste for voice," said Feorene. All of this is encouraging. Hopefully, it will soon translate into greater market share and increased revenues for speech companies.

Judith Markowitz is the technology editor of Speech Technology magazineand is a leading independent analyst in the speech technology and voicebiometric fields. She can be reached at (773) 769-9243 orjmarkowitz@pobox.com.

Speech in the Warehouse

Eltropy Expands Voice Authentication Ecosystem with Illuma, IDgo, and Pindrop

Modulate Expands Velma with Voice-Native Real-Time Conversation Intelligence

Corti Launches Symphony for Speech-to-Text

Why Voice AI’s Next Big Challenge Isn’t Accuracy. It’s Relationship Design.