Why Input Devices Matter

There are many factors that play a role in the quality of speech recognition systems, and one that might not get enough attention is the part of input devices. Such things as noise cancellation and microphone placement are just a few of the contributing factors to this issue, but can be imperative to the quality of the output. While many of these factors are not at the top of the mind when the time comes to purchase a speech software or speech-capable device, paying attention to a few small details can help you get the most out of your application. The first concern should be the type of microphone/headset used. Martin Markoe, CEO of eMicrophones, says, “Devices that come packaged with the recognition software are usually not optimal and should only be used as a backup in case your good microphone breaks.” Eric Larsen, product manager for Fellows, feels, “The devices that usually come with speech recognition software are usually not ideal for collection of sound for speech recognition applications. They are generally omni-directional devices that can not help, but instead gather excessive ambient noise.” Fellows’ headsets utilize noise-canceling microphones that are considered passive devices. They operate on a near field / far field principle where the microphone can differentiate between the user's spoken words (into the front side of the microphone) while the far field, ambient sound (noise) actually enters the back side of the microphone. There are many vendors that provide high quality microphones, and many that have specialized equipment designed for speech recognition. One such company is VXI with its TalkPro headset. This headset is designed specifically for speech recognition software and incorporates several features that make it optimal for its purpose. One such feature is that it uses noise-canceling software. It also incorporates what VXI calls “translator technology” that ensures full compatibility with whatever soundcard the user has installed. Not being fully compatible with the user’s PC soundcard can create problems for speech recognition. When asked about the differences in headsets in the market today, Larsen said, “Many headsets on today's market have microphones which indirectly port for noise canceling, resulting in significantly less cancellation of ambient noise.” According to Larsen, “Direct porting systems, where the rear port of the microphone is designed for optimal noise cancellation performance, are better suited for speech recognition.” As an alternative to the headset approach, Acoustic Magic offers an array microphone, called the Voice Tracker, which, according to eMicrophones, works well with speech recognition software. It has actually been certified by ScanSoft for use with Dragon NaturallySpeaking 6.0. This microphone sits on the users desk, usually in front of the computer, where the user would be speaking. A unique feature is that the microphone elements follow your voice as you move around the room. This microphone is better suited for those people who do not like the restriction of headset microphones. While recognition accuracy is arguably better with headset microphones, because of their closer proximity to the user, this type of microphone allows more freedom for the user. Philips takes a slightly different approach. They have developed a handheld device, called the SpeechMike. This microphone is actually only part microphone, with other incorporated features such as a speaker for listening to output, a trackball for replacing the computer’s mouse, and programmable buttons for greater customization of the user’s needs. This device was designed with the belief that speech recognition can be used not only for dictation, but also for navigating other computer functions, such as email and the Internet. For the headset category, device placement has much to do with accuracy. The microphone housing of the headset, most often referred to as the boom, needs not only to be placed appropriately in proximity to the mouth, but also needs to be placed in the same location in every use of the speech software. While some vendors differ slightly in their description of the best boom placement, most will say that about one inch away from the mouth and out of the direct line of breathing will be most helpful in reducing recognition errors. After several uses, and possibly even some use of a mirror to help remember proper placement, the positioning will become more natural each time it is used. All speech recognition software products offer testing for microphone performance and utilizing this feature will also help to insure the microphone is properly placed for best results. One of the largest concerns when talking about microphones and headsets for speech recognition is noise cancellation. Without proper noise cancellation the accuracy of the speech recognition software can be greatly diminished. There are many instances, both in the office and out of the office, where noise cancellation can play a role. In the office microphones with noise cancellation technology are almost always the best suited for speech recognition. Plantronics has a solution to environments with lower noise levels for canceling those noises that are associated with such environments. Their technology is called Voice Expansion, which reduces line noise, as well as ambient noise by up to 12 dB, depending on the level of the noise. This is effective because such things as line noise are very influential in low noise environments. All of the Plantronics headsets utilize this technology. According to Nick Eisner, senior product manager for Plantronics, “Noise canceling microphones generally produce more breath noise and wind noise. On the other hand they are less susceptible to performance deterioration due to ambient noise. The tradeoff begins to favor noise canceling microphones over omni-directional microphones when the ambient noise level is more than 60-70 dB, depending on the type of noise and the headset's fit to the user.” When you consider embedded speech recognition such as that in cellular phones, the need for noise cancellation grows. As one can imagine the amount of noise in such environments as the car is far more than that of the average office. There are companies that specialize in such noise canceling technology, and have been able to derive some interesting solutions. Michigan-based Clarity has developed a noise cancellation algorithm trademarked as Clear Voice Capture (CVC). This approach is slightly different from other noise cancellation technologies, because it isn’t really canceling the noise at all. Instead, it is “listening” to all of the sound input from the microphone, and then extracts the voice of interest from the sound input. The human voice has unique statistical properties and anything that does not fall within the statistical properties of speech is “cancelled”. Ray Gunn, CEO of Clarity, commented, “By freeing the audio input of noise, the end user will be free to speak naturally. If the end-user is satisfied with the accuracy and performance of their products, this should equal increased adoption rates and penetration rates for speech-enabled devices.” Microphones for cellular devices are far les expensive than other PC-based microphones, which, of course, factors into the overall quality. Gunn, when asked about this issue, stated, “Input solutions like Clarity's Clear Voice Capture software products enable all microphones, particularly inexpensive microphones, to perform like more expensive professional elements, thereby enabling better sound quality and improved feature performance while maintaining market sensitive form factors and price points.” Clarity’s technology is currently available in the THB Hands-free Car kit, as well as being offered as an after-market install in the Bluetooth Hands-free Car Kit, by the Chrysler Mopar division. Another company specializing in noise cancellation, and also more geared toward in-vehicle use, is Wavemakers in Vancouver, British Columbia. Wavemakers has a somewhat similar approach to noise cancellation as Clarity, but is still rather unique. Wavemakers’ technology, called ClearStream, re-synthesizes and optimizes the voice signal for each application, so that recognition systems end up with net negative processing times. Wavemakers really considers its technology as recovering errors that would be present in speech recognition if the technology were not in place. Richard Sones, director of sales and marketing at Wavemakers, had the following to say about this recovery, “In general, we see between 50 to 60 % recovery of errors across all speech engines in the broad range of noise environments.” While Wavemakers’ technology is focused on the noisy environments associated with in-vehicle applications, their technology is a stand-alone product that is also compatible with many microphone-based solutions, such as directional microphones, beam forming, blind source separation microphones and phased arrays. There are several microphone VARs that have helpful hints on their Web sites and offer extensive information about microphones, and specifically about their performance with speech recognition software. Such VARs as eMicrophones (www.emicrophones.com) and SRT Distribution (www.srtdist.com ), offer product reviews and statistics on the dozens of input devices that they have tested. They also list the different features that all of these products offer to help the user better understand which input device is right for them. Whether a person is using a headset, array or handheld device, there are many options that can improve the recognition accuracy of a speech system. With a little investigation, a new or even experienced user can take advantage of the benefits offered by these higher-quality devices. Most of the input devices mentioned fall below the price point of $100, and if a person is serious about getting the most out of their speech software or speech-driven device, the investment could prove to be well worth it.

Companies and Suppliers Mentioned

Why Input Devices Matter

Omilia Launches Lexis TTS Model for Contact Centers

Callie Care Collects $500K for Voice AI Development

AI Voice Agents Increase Specialty Care Program Enrollment

Study Proves Assistive Technologies Improve Users' Lives

Symend Launches SymendConverse

Sunoh.ai Enhances Home-Based Primary Care and Operational Efficiency at Bloom Healthcare

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

Voiskey Officially Launches

VoicePing Releases VoicePing 3.0

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Canary Speech Partners with NeuroLexIQ