Writing out loud: Getting the most from speech dictation software

Is my computer a good listener? Until recently, this wasn't a question that ran through someone's mind when struggling with day-to-day projects on a personal computer. With the arrival of continuous speech recognition software, however, business professionals are beginning to look beyond the keyboard and mouse to a more natural input mechanism - the human voice. For them, maximizing the PC for speech recognition is key. Of course, people wrestle with many issues when deciding whether speech technology is a good fit for them. (This was the focus or our article Tell Me About It: Why Speech Recognition Might or Might Not Be Working for You, in the June/July issue). Once a user has moved beyond these initial questions, and actually purchased and installed a speech solution, however, there are a whole host of tips and pointers that can quickly boost the performance of speech technology. This article delves into the many ways of maximizing speech recognition software for users of the technology. To a certain extent, today's speech recognition software is a victim of high expectations set by science fiction film depictions of talking computers. Obviously, software available off the shelves today isn't going to close an office door or open a window just because a user asks it to. But the key to successfully using today's speech recognition software is to understand what it can and cannot do. Those who have spent time with this software and used it on a regular basis know that there are many ways to maximize speed and accuracy. One thing becomes immediately obvious to anyone using speech recognition software for the first time: it feels a little strange to speak to a computer. But with practice, the novelty quickly wears off and real productivity begins. After all, speech recognition software is simply another tool for accomplishing a task on a personal computer. Users can tap into the software's full flexibility and power by learning the tips, tricks and practical suggestions for voice-enabled computing. There are three key areas users need to consider in order to improve speech recognition results: maximizing hardware setup, tuning software, and adjusting user behavior. Maximizing the Hardware Setup
The first step in properly using any speech recognition software is to make sure the system is ready for speech recognition technology. The recommended system requirements for products on the market today are an Intel Pentium II or equivalent processor with 64 MB of RAM or higher. Speech technology works by comparing the spoken words to huge databases of acoustic and language data, and simultaneously running complex statistical algorithms to select the correct phrases. Because of this, the performance of speech recognition software, even more so than other categories of software, is directly proportional to microprocessor speed and RAM. Systems with more RAM and faster processors produce better results in terms of accuracy, speed and throughput rate. Fortunately, due to falling prices, the average PC available today for under $1,000 can run speech recognition software. Microphones
Just as speech recognition technology is processor and RAM dependent, it also relies on sound input hardware, which includes the microphone and soundcard. All the major speech recognition software packages come with a noise canceling headset microphone that has been shown to provide good accuracy. In most cases, the microphone that ships with the software is the best choice at that price point. More expensive microphones, however, can boost accuracy. Most users should begin by dictating to their speech recognition software with the microphone "in the box." Users who have become proficient at dictation and would like to upgrade their microphone can contact a vendor directly to find the one best suited to the user's software, dictation environment and personal preferences. Microphones are more sensitive to the dictation environment than most beginners realize. Sudden or loud noises in the background can "spike" the microphone settings, degrading accuracy. Constant noise, however, shouldn't affect accuracy, as long as the user performs an initial sound check in that environment to tune the microphone properly. Sudden drops in accuracy may be related to unanticipated changes in background noise, like the hum of an office air conditioning unit coming on, or the sudden whir of a nearby fax machine. Users should perform a microphone check at the beginning of each new session, whenever there is a significant change in environment or voice, or whenever accuracy appears to have degraded. A last word on microphones - they have to be plugged into the correct jack on the system and positioned correctly. A microphone plugged into the wrong jack may even work to some extent during a sound check, but problems will quickly become apparent when using the software. Similarly, it is crucial that microphones are positioned correctly to achieve good results with the software. A common mistake new users make is putting the microphone directly in front of their mouths, where normal breathing can interfere with sound quality, or twisting the microphone element so that it points away from the users' mouth. Even though new speech products include thorough descriptions and sometimes even multimedia videos to ensure these steps are performed correctly, technical support departments encounter these mistakes on a regular basis. Sounding off on soundcards
The second key "gateway" to speech recognition is the soundcard, a device within the PC that processes audio input and converts it to a digital signal that speech recognition software can recognize. A little homework on the user's part is recommended to make sure soundcards are compatible with the software. There is no shortage of vendor Web pages to provide this information. To work with speech recognition software, soundcards must support a minimum of 16-bit recording. L&H has found that the best devices are SoundBlaster or SoundBlaster-compatible soundcards. Audio chipsets are especially relevant to laptop computers because of their design and acoustics. Unlike desktop systems, many laptops lack an appropriate audio chipset, limiting their signal pick-up. The sleek, compact design of these systems means that noisy cooling fans can make using speech recognition problematic. To counter this, speech recognition vendors usually test and certify laptops for their ability to support speech and the results of this testing can also be located on vendor Web sites. New USB microphones, which come with a built-in soundcard, bypass the laptop audio architecture entirely. Many microphone vendors offer these products and users can usually purchase them for moderate prices. Once notebook users have installed a speech-compatible soundcard, they may need to modify the computer's existing settings. Some notebooks come with a 20-decibel (+20dB) boost built into the soundcard, designed to magnify sound coming from the computer. For speech software, this can make even the initial sound check impossible. This setting must be turned off and can usually be found in one of two places. If a soundcard has its own icon in the control panel, it will be there. If not, its audio driver can be found in the audio section of the multimedia menu, accessed via the control panel. Tuning Software
Fine-tuning speech-recognition software is more of a gray area than configuring the system. A number of studies on user experiences have shown that the most crucial part of the process - audio tuning - can be a major source of problems. Accuracy is greatly affected by how users set up, tune and train the software. When dictating the initial script in the audio tuning section, users should see the volume adjustment slider move. If the volume adjustment slider does not "level out" by moving from the top, accuracy may be compromised. Users should repeat the initial dictation to set the adjustment, increasing the loudness of his or her voice until the pointer moves down to at least one notch from the top. This will immediately increase accuracy. Skipping this important step negatively impacts the voice/background noise discrimination technology in noise canceling microphones. Signing up
While every speech technology vendor is working toward making their software useable right out of the box, today users must train their speech recognition software. This "enrollment" process requires the user to read a passage of text so that the software can learn the user's voice. Fortunately, breakthroughs in software have reduced enrollment time, and users can now buy speech recognition products that only require about 5 minutes of training. Obviously, speech recognition technology is complex. A wide variety of complicated processes combine to enable the software to accurately recognize speech. But to users doing the talking, what's important is that the software recognizes their words, and that the product is easy to use. Today's applications thus incorporate a number of utilities and features to boost productivity and increase accuracy over the long term. Users of older systems can boost performance by switching off any unnecessary PC applications that normally run in the background. For instance, when dictating in Microsoft Word, shutting off the spell checker tool will help performance. Speech recognition software doesn't misspell. Other program features in this category include grammar checkers, animated tips and fast find. One of the quickest and easiest ways to increase speech recognition accuracy and build the software's vocabulary is to use a scanning feature found in many of today's speech applications. While this feature has different names depending on the vendor, it enables the software to "review" existing documents and adjust the speech recognition language model to more effectively match the user's writing style and vocabulary. This process also enables users to add words, specific names and places to their software's vocabulary. Another easy way to increase accuracy and productivity is to create dictation shortcuts. These are simple voice macros that enable users to say one or two words and have a string of words inserted into a document. For example, saying "my signature" could add the writer's name, title, address, and phone number to a document automatically. This shaves time off routine tasks and boosts productivity. It's in the way that you use it
Finally, there are a host of tips and tricks for maximizing speech technology that have nothing to do with hardware or software. In many ways, user behavior is the primary driver of dictation success rates. Knowing the capabilities and limitations of today's products goes a long way toward increasing productivity and satisfaction. One of the best things users can do to maximize speech recognition software is realize that for most people, it's not a replacement for the keyboard and mouse, and any vendor that makes this claim is creating unrealistic expectations. Rather, speech recognition software complements existing user interfaces and workflow. While speech is the most natural interface for communicating information, it may not be the best way to accomplish every task. Productivity improvement is the goal, and users should utilize the tool that works best - voice, keyboard or mouse - in a given application. For instance, if a user is building a spreadsheet in Microsoft Excel, it may be easier and faster to move to a particular cell within a column by voice - "move to cell B4, move to cell B11" - but easier to move from one column to the next using the TAB key. When correcting documents, some users may find it easier to select words verbally, and others may find it easier to "double click" and still get the full voice capability of correction. Some Final Tips

Speak naturally. The worst thing a user can do if their speech recognition software doesn't understand them is to raise their voice, shout a command, over-enunciate or speak in a stilted, one-word-at-a-time fashion. The software has been trained to recognize the natural tone of the user's voice, with words spoken without interruption at the user's normal speed.
Train yourself and the software at the same time. By initially reading a written document into the software, beginning users can get a feel for "speaking like they are writing." This often yields very high accuracy, and makes that first experience with speech recognition software more rewarding.
Look away from the screen. Even though the application is converting text to speech in real time, what users see on the screen is always a few seconds behind what they're saying. This can be very distracting, causing them to pause or hesitate, and/or throw off the software. Doctors, who as a group have been using dictation software professionally for a longer period of time than most, very rarely look at the screen while dictating.
Dictate full sentences instead of sentence fragments. The longer the string of words a user dictates, the better recognition rate the software achieves. This is because the software recognizes words in context to achieve higher accuracy. The more context it has, the more words it can recognize.
Speak at a steady pace. Hype aside, very few people speak clearly and naturally at speeds beyond 110 words per minute. Professional speakers can speak faster and still clearly enunciate and separate their words, but for most users a steady, conversational pace is best.
Listen to the tutorial. Users should listen carefully to how the speaker pronounces, enunciates, and dictates text and commands. This is how speech recognition software learns users' mistakes, as well as "correct" dictation.
Repeat problem words. Retraining mis-recognized words, commands and punctuation immediately improves the software's ability to get it right the next time.

Dictation is like golf -- working on the shots that give you trouble can yield remarkable improvements in your game. Users who get the most out of speech recognition software configure the software properly on the appropriate system and then speak at a natural pace and volume, pausing to train the software when necessary. Beginners will be amazed at the improvements they can achieve as soon as they stop thinking about talking to a computer.

How do you obtain maximum performance with your speech products? E-mail Paul McNulty, vice president of Lernout & Hauspie's PC Applications Group,( pmcnulty@lhsl.com ) if you have a suggestion or question.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Writing out loud: Getting the most from speech dictation software

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API