January 30, 2007
By Judith Markowitz Principal - J. Markowitz, Consultants
Forward Thinking

Hold the Pickle II

In my last column, I described some of the challenges that face speech recognition in drive-through facilities connected to fast-food restaurants. This column presents a solution proposed for one set of challenges in that article: those related to acquiring a good speech signal.

Background clamor at drive-through facilities includes a plethora of sounds, including engine noise, horns, radio blare, passing traffic, and raucous passengers. That noise may be accompanied by gusting wind and other weather concerns, like rain or snow. Customers speaking in this cacophony may have soft or hoarse voices, strong foreign accents, or speech that is muffled by wool scarves or veils. Most of those speakers also exhibit a natural response to these noisy environments by raising their voices exponentially, an effect called the Lombard Reflex, which is known to have a deleterious effect on speech recognition systems as well.

The proposed solution for capturing quality voice input at a drive-through is array microphones. "One immediately suspects that problems with microphones constitute the primary bottleneck for speech at drive-up facilities, but that's not the case," explains Douglas Andrea, president of Andrea Electronics, a Bohemia, NY-based manufacturer of hardware and software microphone technologies. "Effective microphone array technology has been available for over seven years."

Exit41, a drive-through systems integrator for fast-food restaurants like Wendy's, Burger King and Panda Express, is deploying Andrea's microphone-array technology at some restaurant drive-through windows across the country. "Acoustically, the array microphones have performed much better than single-element microphones, especially in noisy environments like drive-through lanes that abut a highway," explains Marcel Koster, Exit41's product manager. "Array microphones have allowed two major acoustic benefits: background noise reduction and acoustic 'focus' on the desired target—the customer."

The audio that is captured by array microphones is cleaner as well. "When you're looking straight at it, it has a beam shaped like a tall ellipse around the ordering window," says Andrea. "That's the sweet spot. It has flexibility from top to bottom." This means that a customer's voice is in the sweet spot whether he is in a truck or a sports car. Tailpipe and engine noises are canceled out because the front and back of the customer's car and the cars in front and back of it are outside of the cone.

Array microphones are clearly attractive for drive-through ordering, but deploying them in any outdoor facility is not entirely straightforward. The devices must resist environment assaults, such as temperature extremes, high humidity, rain, snow, and dust. "This is doable," Andrea asserts. "In fact, we've designed products to resist those environmental effects in an outside environment."

Device Hardening
The human element represents yet another threat to drive-though transmitters. Anything out of doors can become a target for vandalism. "You don't want someone throwing a Coke and gunking it up," explains Andrea, "or hitting it with a baseball bat." For this, Andrea recommends using array technology that distributes microphone elements across a horizontal plane (called "broadside") because it can be embedded or placed behind a screen.

In deploying such technology, though, positioning is the key to its success. The microphone will work properly only when the people placing the orders talk into the "sweet spot." If they overshoot or undershoot it, their voices will be canceled along with all the other ambient noises that may be present. A quick and easy solution involves placing menus or other physical elements in such a position that they naturally lead the customer within the cone. Depending upon the kind of restaurant it is, says Andrea "you could even make it look like a smiley face and the speaker could be just above it."

What would something like that cost? In very small quantities, the price tag is around $350 (retail); for high volumes, it can drop to less than $200 per unit. That doesn't include the hardening.

Let me know of other solutions for this challenge.

Judith Markowitz is the technology editor of Speech Technology magazine and is a leading independent analyst in the speech technology and voice biometric fields. She can be reached at (773) 769-9243 or jmarkowitz@pobox.com.

Hold the Pickle II

Deepdub Partners with Wonderful

Boost.ai Introduces Adaptive Voice

Krisp Launches Listener-Side Accent Conversion for Meetings, CX and Voice AI Agents

Revmo's Voice AI Rollout Yields 71 Percent Conversion and 99.9 Accuracy Across Donato's 174 Stores