November 1, 2009
By Leonard Klie Editor, Speech Technology and CRM magazines
Features

Speech Declares Its Independence

Ken Ackerman, president of The Ackerman Co., a warehousing and supply chain consulting firm based in Columbus, Ohio, first encountered voice technologies in the warehousing space in 1985 while touring a Nissan replacement parts distribution center in Japan. At the time, he was impressed with how well the workers responded to the voice—that of a very polite, professional, and attractive-sounding female.

While the technology at the time was new and sexy, employing it was a long and arduous process that involved plenty of customization, complex layers of middleware, months of trial and error followed by more months of fine-tuning, then more trial and error, and so on. Systems relied on slow, bulky hardware and voice recognition software that was slow and often unreliable. Companies that used the technology did so at great expense, and under a high degree of financial and professional risk.

Today, speech-enabling a warehouse is much more of a turnkey process. And one does not have to travel all the way to Japan to find a warehouse that is using the technology. Chances are very good that the groceries on the local supermarket shelf, the health and beauty products at the local drugstore, and the merchandise at the big-box retailer down the street were all handled by warehouse workers using a voice application. A number of experts have even gone so far as to call voice recognition systems the most important technological breakthrough in warehousing operations in the past 25 years.

Increased speed and memory capacity have made the technology more reliable, and more advanced software delivers near-perfect recognition, even under the harshest of conditions. Bulky portable computers have been replaced by smaller, lighter, more advanced, and more agile devices that are easier to configure, use, and maintain.

And, as one might expect, the cost of the technology has come down significantly. Voice-based systems that used to cost thousands of dollars per user—mostly because they required expensive proprietary hardware and software—have given way to open-source technologies that can be used interchangeably with many different back-office systems and portable device types.

“Originally, vendors used their own hardware, and everyone was paying big money for it because they didn’t have a choice,” Ackerman recalls. “Now [the vendors] are all using off-the-shelf hardware that you can buy at Radio Shack.”

Steve Gerrard, vice president of marketing at Hamilton, N.J.-based Voxware, a provider of software for voice-driven warehouse operations, sees this as one of the most important trends in the industry in the past few years. “Up until about two or three years ago, there was a very large proprietary hardware market,” he says. “Now making [voice systems] hardware-independent has opened up the space to new vendors, which has also caused prices to come down. What could have cost $7,000 before now you can get for under $2,000.”

Also adding to the high cost in the past was the amount of time and effort involved in creating customized applications and then integrating them to back-end warehouse management systems (WMS), enterprise resource planning (ERP) systems, and other applications. But with many of today’s applications, that is no longer an issue. Vangard Voice’s AccuSpeech Mobile Voice Platform, for example, is system- and network-independent, meaning it resides entirely on whatever mobile device the warehouse worker uses every day.

“We voice-enable what they’re already doing on whatever device they already have,” says Bob Bova, president and CEO of Vangard Voice.

Bova says most warehouse operators today “are very interested in adding speech into their infrastructure,” but they don’t want to replace all their current systems, applications, and business processes to do so.

“The whole idea shouldn’t be about ripping out the guts of what you already have,” Bova says. “People do not want to reinvent the wheel to add voice.”

At the same time, hardware costs have also come down as devices have become more rugged and cheaper to produce. While the replacement device and parts market was lucrative for most vendors in the past, today’s devices can better handle most of the bumps and bruises connected with warehouse work. Devices are being built to withstand drops of several feet and can be used in freezers and very hot or humid areas of the warehouse—all of which caused havoc on circuit boards and battery packs of the past. Several vendors even tout models that can be run over by forklifts without missing a beat.

“And greater ruggedness equals a greater number of transactions per hour that the device can support,” says Tom Murray, vice president of product development and marketing at Vocollect, a Pittsburgh-based provider of voice-based warehouse systems.

Served with SOA

Lower costs have also been helped along by the services-oriented architecture (SOA) now being employed by most vendors. This has created a plug-and-play environment in which voice systems can easily be integrated into any WMS, ERP, or other back-office application, as well as any device type. As a result, implementation time has been shortened from months to weeks, and a move toward multipurpose deployments has sprung up.

In the current SOA environment, a warehouse worker could use the same device—configured with an integrated scanner or radio-frequency identification (RFID) reader—for receiving in the morning, and then switch to voice picking in the afternoon simply by attaching it to a headset.

That trend is advancing quickly, according to John Schriefer, marketing manager at Lucas Systems, the Wexford, Pa.-based manufacturer of the Jennifer voice picking application. “Any company not developing software products using an SOA environment is about four or five years behind the times,” he says. “They’re just not keeping pace with current market trends.”

Voxware’s Gerrard agrees. “SOA is very much aligned with the approach of many corporations. It’s Web-centric, agnostic, and enables systems to interact with one another,” he says. “It’s the new standard, and vendors that do not have it need to put it in. It’s even more important to customers now than it was a year or two ago.”

Voxware, the first vendor to offer an voice-based warehousing products in an SOA, credits VoiceXML for many of the changes that have taken place in warehouse voice systems during the past few years. “VoiceXML drove the hardware-independent trend. If it weren’t for VoiceXML, we would still have a lot of high-cost, proprietary systems out there,” Gerrard says.

Instead, the emphasis today is on multipurpose devices that incorporate other capabilities beside just voice. “Everyone should be using full [radio-frequency] terminals rather than voice-only terminals,” Schriefer says. “They should be taking advantage of the other capabilities of those terminals. They have barcode scanners, keyboards, RFID readers, and even video screens that can be used to show the worker a picture of what needs to be picked.”

An SOA also lowers costs by expanding the pool of vendors and developers that can work with the technology to perform upgrades, add-ons, and integrations. In many cases, warehouse managers can install the changes themselves, not having to rely on the system manufacturer at all, even as they expand the use of the technology to other facilities and add functionalities to it.

Levels of Dependence

Voice recognition systems that are used in warehouse operations fall into one of two categories: speaker-dependent or speaker-independent. Speaker-dependent systems require users to train the voice application to recognize their voices and unique speaking characteristics by repeating a specified series of words, numbers, phrases, and other utterances. Those unique utterances are recorded and stored in the system. Then, each time the worker starts work, he logs into the system, and the application calls up his voice profile for comparison.

By contrast, speaker-independent systems don’t have to be trained or calibrated to recognize a specific individual’s voice pattern. Rather, they rely on collected samples of recorded human speech patterns from which they create statistical models. They are designed to recognize almost everyone—theoretically, a person should be able to come in off the street and start picking orders within a few minutes.

“The speaker-independent solution has proved to be completely reliable,” says Marceline Absil, vice president of marketing and sales at TopVox, manufacturer of the TopSpeechLydia product.

But the main benefit, Absil explains, is the reduction in training time. “No training is required, which saves approximately 50 minutes per operator. For that reason, this system can easily be used by temp personnel as well.”

That is one of the greatest benefits realized by Cooper Booth Wholesale, which is based in Mountville, Pa. The wholesaler uses a speaker-independent system from TopVox in its warehouse. “Our voice solution adds to our overall profitability through improved productivity and greater picking accuracy. We can literally pull someone off the street and, after a few minutes of training, have him up and running,” says Trevor Martin, vice president of operations at Cooper Booth. “Overall, we’ve seen an increase in productivity by about 18 percent.”

Still, not everyone is convinced that speaker-independent systems are right for the warehousing environment, citing better recognition and fewer problems with speaker-dependent systems.

“Speaker independence is the wrong move,” Ackerman says. “If I were buying a system for a warehousing application, I would buy something speaker-dependent, most definitely.”

Ackerman sees reduced accuracy alone as a significant enough drawback. “From my own experience with voice systems, it’s very frustrating when a system does not understand me,” he says. “Speaker-dependent is more reliable, in my view.”

And in the fast-paced warehousing environment, the fewer times a worker has to repeat himself, the more productive she can be. “Open systems are not necessarily as robust as the warehouse environment needs. With speaker-dependent products, there are far [fewer] repeats the system needs to ask for. Recognition is better, and the workflow is improved,” Vocollect’s Murray says.

Recognition shortcomings pose an even greater risk, according to Voxware’s Gerrard. “The system has to recognize what [the worker] says. If he has to keep repeating himself, he’s going to tear the headset off and find another way to do [his job],” he says.

The main reason for recognition problems in the warehouse is the noise. Forklifts, refrigerators, freezers, conveyor belts, trucks, pallet jacks, and workers themselves all contribute to noise levels that can hamper a speech system. “We’ve seen some spectacular failures in speaker-independent settings because of noise,” says Gerrard, whose company offers speaker-dependent systems. “There are all kinds of noises—either constant or intermittent—that a system has to contend with.”

Generally speaking, the warehouse environment is still industrial enough where speaker-independent systems do not work well enough, he adds.

Users of speaker-independent systems argue that noise-canceling headsets and the small vocabularies required by most warehouse systems make recognition errors a far smaller problem than people have been led to believe. Most warehousing operations can get by with built-in vocabularies of about 100 words and phrases that the worker will need on a regular basis.

ODW Logistics, a Columbus, Ohio-based third-party logistics provider, for example, deployed TopVox’s system in early 2008 at a 250,000-square-foot distribution center it operates for a number of home improvement retailers.

Jon Petticrew, ODW’s vice president of operations, says the company initially built its system with about 70 words and phrases, and has expanded its vocabulary only slightly since then. “Ours is an all-forgiving system,” he reports.

Multiple Languages

Another factor affecting recognition is that many warehouse workers do not speak English natively. Speaker-dependent systems can be programmed to handle dozens of languages, and users can even train systems to recognize responses in multiple languages. One user can respond to one statement in English and another in Spanish, and the system will handle both without any trouble.

But Vangard’s Bova doesn’t see the language issue as a deterrent. His company’s AccuSpeech Mobile Voice Platform is speaker-independent, and he reports no problems, not even with accents and dialects. “Once you build the initial grammar, you can fine-tune it for accents, dialects, or whatever else you want,” he says.

Lucas Systems weighs in on the speaker-dependency issue right down the middle, touting a hybrid approach that can support both speaker-independent and speaker-dependent technologies, even though Schriefer personally endorses speaker-dependent ones.

Schriefer touts the adaptive qualities of Lucas’ systems as a way to address dependency issues. “Jennifer does not require frequent retraining. It adapts as people talk more and more to it,” he says.

This is important, he adds, because “users talk with their Sunday best when they’re doing the training, and then talk differently when they actually get out on the warehouse floor.”

TopVox’s solution also sports a hybrid approach that enables users to combine speaker-dependent and speaker-independent systems in the same operation. “For example, 20 pickers might be on U.S. English, 10 on speaker-dependent, and five on Spanish, all of which are possible to mix and match,” Absil says.

But regardless of the type of system installed, warehouse work has become an interactive experience. “The real evolution in the industry is from a voice-directed workflow to a voice-assisted workflow,” Vocollect’s Murray says. “Instead of making a worker an automaton just going where the WMS tells him, he is able to talk back to the computer to change the overall work output.”

As an example, Murray points to a system that tells a worker to pick containers of blueberries first. The worker can tell the system to override it if he knows an order of potatoes will be picked next and that would crush the berries on the bottom.

Dialogue can’t be one-way anymore. “It’s really about workers talking to the computer, with more back and forth,” Murray says.

Speech Expands Beyond Picking

Voice-based applications have typically worked their way into the warehouse first in an order-picking capacity, facilitating communications between a warehouse management system and a worker on the warehouse floor. These applications typically relay instructions, such as where to go and how many items to remove from a shelf to fill a particular order, but recently they are gaining traction and showing greater potential outside basic picking operations. Some of these areas include replenishments, put-aways, returns-processing, cycle-counting, sortation, truck-loading, receiving, maintenance, and cross-docking.

“Picking is still the primary area. You start with picking and then extend voice to upstream or downstream processes,” says John Schriefer, marketing manager at Lucas Systems, manufacturer of the Jennifer voice-based warehousing application. “Replenishment is a good complement to the picking application. If a worker gets to a picking location and it’s empty, he can trigger an immediate response.”

But the uses go far beyond that. Shaw Industries Group, a flooring manufacturer, first installed Vangard Voice’s AccuSpeech Mobile Voice Platform in April for cluster picking in its warehouse. Now it is considering additional warehouse applications, such as yard management, dispatch, delivery confirmation, and truck inspections. “Now you can take speech outside the four walls of the warehouse,” says Bob Bova, Vangard’s president and CEO.

“It always starts with order picking—few people buy it initially for something outside of order picking,” says Ken Ackerman, president of supply chain consulting firm The Ackerman Co. “Once they discover how well it works there, they branch out. It makes sense that once they see how well speech works, they want to try it elsewhere. As the equipment becomes more common, people will come up with even more new ways to use it.”

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Speech Declares Its Independence

Speech Expands Beyond Picking

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

Vuzix Introduces LX1 Smart Glasses for Warehouses