Speaking Frankly

In the speech recognition dream, you can talk to your computer transparently and freely without wearing something on your head and without a cord dangling down from your ear. The first crop of desktop array microphones has arrived, moving us closer to the speech ideal as they deliver on their promise of cord-free dictation. Headset microphones have been the dominant input method for speech recognition because of their consistency. The microphone element stays at a consistent distance from your mouth, even if you move around or turn your head. But many potential users stay away from speech recognition because they don't want to wear a headset and don't want to be tethered to their computer by a cord. The popularity of portable voice recorders that work with speech recognition has grown in part from the reluctance of many users to wear a headset. In a microphone array, a group of microphones acts as one to reduce noise in the speech signal. This allows clear, consistent sound input even if the speaker moves around. The available desktop arrays - from Andrea, Labtec and GN Netcom - each have between four and eight microphones mounted in a row. HOW ARRAYS WORK
Arrays rely on the physical fact that sound waves take time to travel. The sound of your voice - any sound, in fact - will reach different microphones at slightly different times. If you speak directly towards the array, the time differences will be less than if you are at an angle off to one side. Arrays include digital signal processing circuitry to almost instantly analyze the time differences. The circuitry adjusts the timing of each microphone signal and combines them to make the sound of your voice stronger. Sounds coming from other angles, such as background noise, decrease in signal strength. There are many sophisticated variations on how arrays function, but all exploit the small time differences between when the sound of your voice reaches the different microphones in the array. Microphone arrays improve accuracy substantially over regular desktop microphones, which work poorly for speech recognition. With an ordinary, single-element desktop microphone, the sound volume and quality that the microphone picks up varies tremendously unless the speaker stays in a consistent place. PRESENT AND FUTURE
Now that an array microphone sits atop my monitor, I find myself much more likely to use speech for short commands and brief e-mail messages. Previously I would use speech recognition only for dictation of several paragraphs or more, because of the minor inconvenience of putting on a headset. Array microphones will soon be built into monitor casings and perhaps into some keyboards. Within a year, we'll see arrays with an effective range of 6 feet or more, compared to the 3-foot maximum that now exists. Voice-activated handheld devices such as PDAs will also include arrays to increase their recognition effectiveness. My array "wish list" includes models with dozens of microphone elements coming out at all angles from the computer. You would be able to pace in front of your computer or look up at the ceiling while talking and the array would continue to deliver a clear, consistent sound signal. The current groups of array microphones pick up more background noise then do headset microphones. This is an unsurprising consequence of the microphones' range. Arrays are made to cast a wider net for sounds then a single headset microphone element. The arrays pick up people talking in the background and noisy keystrokes, and speech recognition software inserts small extraneous words as a result. It would make arrays more useful if the microphone, or the speech software, could detect and remove keystroke sounds. Also, all speech software should have the option for a "push to talk" key, not just a microphone on/off key, to allow the dictation of quick commands and single sentences. Still, the arrays now available are useful today, while they also give us a glimpse of the "talk anytime, anywhere" future.

Dan Newman is president of the speech software consulting firm Say I Can Inc. in Berkeley, Calif. His free newsletter is available online at www.SayICan.com

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Speaking Frankly

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API