Tell Me About It: Why Speech Recognition Might or Might Not Be Working For You

What’s so weird about talking to your computer? For regular readers of this magazine, probably nothing. But we have all seen, and at some time in the past, experienced, the wide range of responses to the idea of talking to a PC. For some it’s the most natural thing in the world, certainly more natural than typing on a plastic keyboard. For others, however, it’s not that simple – using speech recognition software to create documents or control a computer just feels strange. Why? One reason may be that speaking to a PC, or to any inanimate object, is fundamentally different from speaking to another person. In fact, the PC may be the first business machine we’ve ever spoken to without intending our words to be heard by another human being. We speak through telephones knowing someone is listening on the other end. We record speech knowing that the ultimate audience will be a transcriptionist or another person. Of course we all talk to our cars (sometimes in colorful language!) or plants on occasion, but on the whole, we’re used to speaking to people, not to inanimate objects. We think of speech as an exchange with someone who understands us. And until now, inanimate objects didn’t. With the introduction of speech recognition software, all this has changed. Users are being told that dictating text and speaking commands into a headset to control Windows 98 is as easy as speaking to their best friend. For some users, it is just that easy – over the last two years they have experienced a dramatic increase in productivity and comfort by using continuous speech technology. For others, whose experience with speech technology hasn’t met their high expectations, hopefully this article (and two to follow in future issues) will explain how today’s products can yield positive results with proper set-up and training, helpful tips and suggestions, and a better understanding of the software’s capabilities. Innovations often slowly adopted
People have been wary of some of the world’s most useful inventions. A cartoon by humorist James Thurber, famous in the 1930s, showed a worried woman eyeing a suspicious light socket in the ceiling. The caption read, "Electricity was leaking all over the house." The fax machine is an extreme example of slow technology adoption. Although the first patent for facsimile transmission over wires was filed in 1843, after millions of development dollars, as late as 1970 there were still only 50,000 fax machines in the entire U.S. It took another 20 years for people to drop their preconceptions about what the technology could and couldn’t do and start using it as a practical business tool--not as a replacement for existing mail and telephone service, but as a complement to it. Speech recognition technology may also suffer from users’ overall hesitancy to adopt new technologies, compounded by unrealistic expectations based on popular science fiction films such as Star Trek or 2001: A Space Odyssey, in which people rattle off rapid-fire conversational questions and commands to computers. Those new to dictation software are often surprised to learn they must wear a headset and undergo an "enrollment" or training period, during which the software learns to recognize an individual human voice. Today’s speech recognition software doesn’t match their expectations, and this obscures the real benefits of the new technology. "Write out loud"
There is a definite psychology to speaking to a PC. First of all, users have to be willing to wear a headset with a noise-canceling microphone. Today’s built-in PC microphones aren’t sophisticated enough to recognize a user’s voice while filtering out background noise. This means that anybody using dictation software today has to wear a headset microphone and be literally hardwired to a PC. Though more and more people seem to be wearing headsets in public – witness Madonna, Garth Brooks, Wall Street stock traders and federal agents – it still feels strange at first to most users, especially after seeing the way Dave commanded Hal to "open the pod bay doors" in 2001 while moving about the spaceship. The kind of microphone someone uses and how they use it is fundamental to achieving good results with a speech recognition product. Dictating into a computer requires users to take a fundamentally different approach to gathering and communicating thoughts. This is because speech recognition relies on a language model that mimics the way sentences are written, which is sometimes entirely different from how we speak to each other in conversation. Even though the software recognizes continuous speech (it used to only recognize speech with pauses after each word), it still works best with "full sentence" speech that sounds like written text. Consequently, to achieve good results with today’s products, users must "write out loud," speaking in complete sentences at a measured pace – just as if one were reading text off a page. To get a better idea of how this works, users might try dictating first by reading text without looking at the computer screen until they have finished dictating, and then try the same exercise while looking at the results. Often the appearance of text on the screen can be distracting and can subtly but significantly affect the pace and clarity of the user’s voice, and hence the final dictation results. Users should be seated comfortably, with a glass of water nearby, and should avoid leaning over the keyboard. These simple changes in technique and approach can often dramatically improve the accuracy of speech recognition software æ another area ensuing articles will explore. Two speech extremes
Speech recognition software seems to engender two extremes in its users. There are those who use it constantly and aggressively, for letters, memos, white papers, business plans, e-mail and anything else they need to do while sitting in front of a computer screen. Then there are those who try it once or twice and never use it again. The vast majority of those who stop using speech recognition do so right away for one of two simple reasons. The first is purely technical æ dictation software won’t run on a user’s computer if it doesn’t meet the processor, memory or sound card requirements. However, thanks to ever-increasing processor speeds and memory capacity, this is becoming less of an issue. In fact, almost any new PC purchased in 1999 will support speech recognition software. People who do get past the PC configuration hurdles can be initially disappointed by the software’s accuracy rate. Users generally expect 100 percent accuracy æ they don’t expect to have to check words for accuracy. The end result is shelfware; frustrated users give up on the product before it has a chance to adapt to their particular voice. Once again, 2001’s Hal computer has raised expectation levels over patience levels. But consider those who do continue to use speech software. Studies suggest that they are often extraordinarily evangelistic about it because they have spent time training the products to obtain maximum accuracy. They don’t just use speech recognition a little bit. They use it more than 50 percent of the time they spend on their PCs. Between these two extremes lies a large number of users that could benefit from practical suggestions on how to get the most out of dictation software – tips and tricks for maximizing comfort, boosting accuracy rates and improving overall experience with speech technology. Dictation software is only the first of many new applications users will operate using speech. In the coming years, users will control not only their computers, but also their household devices, automobile electronics and personal digital assistants by speaking to them. Microsoft’s Chairman Bill Gates has said that, "Speech is not just the future of Windows, but the future of computing itself." If that’s the case, what new users really need is a practical guide to using, integrating and maximizing speech recognition for use in a wide variety of home and office environments. Hopefully, we are providing that with this and future articles.

Paul McNulty is the vice president of Lernout & Hauspie’s PC Applications Group and can be reached through Lernout & Hauspie at http://www.lhs.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Tell Me About It: Why Speech Recognition Might or Might Not Be Working For You

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

CivAI Launches AI Voice Game to Demonstrate the Future of AI