Speech Technology Magazine

Look Before You Leap: Is Speech Dictation Right for Your Company?

As more employers consider speech recognition, they need to address how to implement this new technology successfully. Many factors contribute to the successful application of speech recognition, but an initial evaluation of the job to be performed is essential. Then, proper training and technical support will be required and a reasonable amount of time needs to be budgeted so the worker has a realistic opportunity to maximize the productivity benefits of the speech recognition system.
By Renee Griffith - Posted Jan 1, 1999
Page1 of 1
Bookmark and Share

As more employers consider speech recognition, they need to address how to implement this new technology successfully. Many factors contribute to the successful application of speech recognition, but an initial evaluation of the job to be performed is essential. Then, proper training and technical support will be required and a reasonable amount of time needs to be budgeted so the worker has a realistic opportunity to maximize the productivity benefits of the speech recognition system. With the large variety of speech recognition dictation products now available, it can be confusing to select the best package for your needs. There are at least 20 different titles available on the retail shelves today, including continuous speech and discrete speech products. The latest releases of continuous speech software have just started to offer command and control of not only Word 97, but many other Windows applications. All this, using natural language commands. Evaluate Applications
An evaluation process should involve reviewing the applications accessed by the employee and the time percentages spent in each application. Not all jobs requiring the use of a computer can be competitively performed using speech recognition software. Equipment on site should be examined to determine if an upgrade is required or if replacement is necessary. The environment in which the speech recognition software will be used must be evaluated to ensure that there will be no problems with background noise or coworkers. After reviewing the particular needs of the employee, a determination can be made regarding which speech recognition software package to recommend. Different packages have different advantages and disadvantages depending on what specific tasks need to be accomplished. Training and support recommendations should be made only after completing an evaluation. In addition to evaluating speech recognition software and your own computer processing power, don’t forget to take a serious look at the job where speech recognition will be used. In cases where an injured worker is being re-trained, consider these factors in deciding if the worker can become more productive with speech recognition: Physical duties: It is necessary to thoroughly identify the physical duties that an employee is required to perform. Speech recognition software can do nothing to relieve discomfort from physical activities and it is important to realize that from the beginning. Computer duties: A complete evaluation/description of the necessary computer duties is required in order to make an accurate assessment as to which speech recognition package is best and how much time will be required to effectively implement the speech recognition software. Existing application/computer skills of employee: Many times, an individual’s assessment of his/her skills can be over- or understated. Often, the employee, supervisor or vocational rehabilitation counselor have different ideas of just exactly how computer "literate" an individual is. Simple software functionality testing can identify the weak areas and training can then be recommended. Equipment
For optimum performance of today’s speech recognition packages, a Pentium II/333 should be considered the minimum. Additionally, a minimum of 64Mb RAM is recommended however, 128Mb RAM is preferable because today’s applications are extremely RAM intensive. In fact, if your equipment can accept more than 128 Mb RAM, put more in! If your environment is NT 4.0, we strongly recommend 128Mb RAM as a minimum. All the software packages require a SoundBlaster 16 (or compatible) or higher sound card. Each speech recognition package operates with different operating systems including Windows 3.x (discrete only), Windows 95 and 98 or Windows NT 4.0. All the continuous packages can operate in NT 4.0 and are 32-bit applications. However, there is extremely limited command and control available in NT 4.0 using any of the speech recognition packages. It is vital to assess the current computer equipment used by the employee to determine if an upgrade or replacement of the system is necessary. In addition, it is extremely important to identify the type of operating system utilized (network, mainframe, or UNIX). Certain speech recognition packages work better with different operating systems. It must be determined if the employee uses standard Windows software packages or whether custom software packages are used. This will influence the decision of which speech recognition package is appropriate. Speech recognition software today is extremely robust in its ability to filter out background noise. Offices and cubicles are ideal environments for speech recognition software. Location of the employee’s job area is important as well (near noisy equipment, next to the conference room). Surrounding materials in the area, such as glass (large windows) or concrete walls, also make a difference in performance of speech recognition software. All these factors should be addressed in an evaluation. System Features
Three characteristics to examine when selecting a system are recognition accuracy rate, speed of dictation and size of vocabulary. Recognition rates are around 95%. The speed of dictation varies according to the type of hardware used, the experience level of the individual and whether a discrete or continuous speech dictation product is being employed. In general, discrete speech dictation can reach 60 words per minute with 97% accuracy. Continuous speech users are enjoying 140 words per minute or more, with 95% accuracy. An active word vocabulary is one that resides in RAM (random access memory). The total vocabulary refers to the entire available dictionary on the hard disk. These can run up to 230,000 words, with add-on vocabularies and foreign languages available to accommodate specialized industries and international uses. Current continuous dictation packages also include special utilities and tools to help build custom vocabularies for industry specific needs. In this process, the continuous software "reviews" various documents that you instruct it to examine and produces a word list which is then edited to remove unwanted terminology. The systems must then be trained for any words which are not in the speech software language model. Accuracy should then be extremely high once enrollment and vocabulary building are completed. Certainly, hands-free operation of a PC is one of the major benefits of speech recognition software. This type of software can potentially reduce the occurrence of Repetitive Stress Injury, (RSI) , or give relief to someone already suffering from RSI. For those who do not suffer from RSI, an increase in productivity is possible due to the formatting, editing, and navigation controls available. For those who don’t know how to type, or who type slowly, continuous speech software assists in more productive and accurate typing. Finally, speech recognition allows access to jobs previously denied the disabled who can not use their hands to operate a computer. Discrete speech requires an individual to put a pause in between each word, and this can cause frustration when learning. Most of us talk at a rate in excess of 150 words per minute and therefore feel slowed down by discrete speech. With continuous speech products, there is less user frustration and strain on the vocal chords. If an individual has a speaking disability, the discrete packages are often necessary for recognition. However, continuous speech software is improving in this area. Voice strain is a potential problem when using speech recognition, especially when used by an individual with RSI or hand disabilities. They will be using their voice much more than an individual who can use mouse and keyboard combined with speech recognition. Attention should be paid to the voice of the employee to identify any possible problems with voice strain and, if necessary, an appointment made with a technical voice instructor to learn proper breathing and speaking techniques. Continuous speech recognition software may help to relieve this problem. Motivation and Patience
Most importantly, it takes tremendous motivation and patience to be successful with speech recognition software. Many injured individuals take between 30 and 50 hours of working with the software to establish a good voice profile and create the necessary macros to reach maximum productivity. It also takes the software time to formulate how an individual speaks. This is the most frustrating part. Constant attention and stringent correction are required to get excellent recognition accuracy and speed. If the employee doesn’t invest this preliminary time, he/she may be frustrated by a low accuracy rate. Even in continuous speech packages with people who are not injured, the initial investment of time will be somewhere around 12 hours to establish a good vocabulary and high accuracy. For those who have little or no computer experience, it is absolutely imperative that they learn how to operate a computer using speech recognition from the beginning. This would apply both to a new employee and an employee had performed a job function not involving a computer but is being retrained for a computer position. This individual will require classes in speech recognition, Windows 95/98 and any other software package required to perform a specific job. Our view is that even the experienced computer user, familiar and comfortable operating a PC in the Windows environment, should take a minimum of nine (9) hours of speech recognition training. The first six hours (Introductory Training) consist of an overview of the speech recognition software, proper voice dictation techniques, program commands, simple voice macros and establishing a solid and accurate voice profile. The final three hours of training (Advanced Training) helps develop experience and proficiency in the use and development of more complex voice macros to be used in a specific work situation. This training applies to both discrete and continuous packages for individuals with hand injuries or disabilities. For non-injured individuals, six hours is enough to get someone going successfully with continuous speech. Technical Support Just knowing how to use the speech recognition software is half the battle. Knowing how to integrate speech recognition with existing applications is the other half. In cases where there is proprietary or custom software involved, it is vital to the success of the individual to provide on-site training to assist with integration. The experienced speech recognition instructor can assist the individual to quickly create macros that will help the employee become productive more quickly and with less frustration. As with all software, ongoing training is vital to continued success. The speech recognition companies have traditionally released upgrades approximately every 9-12 months. With each upgrade, new features become available and the person using speech recognition may need to take training in the newest version to remain productive. The speech recognition companies offer a variety of technical support options. Most of them offer 90 days of free technical support from the date of purchase. After the initial 90-day period, individuals can purchase support contracts directly from the manufacturers. In addition to this technical support, many speech resellers offer custom support packages, which is necessary where custom software is used. Although the speech recognition companies can’t support every possible custom software package available, most of the problems associated with implementing speech recognition are resolved within the 90 day period, particularly following an evaluation and proper training. For supporting more than one individual, a training program for MIS/IT people should be considered. By training the Help Desk staff to answer questions and troubleshoot most technical problems, a successful rollout of this technology can be achieved. This technology can be used to help return injured workers to their computer jobs as well as reduce the risk of RSI and increase productivity for the non-injured worker. Using a combination of speech recognition, keyboard and mouse makes computing fast and safe. Properly implementing this technology on a wide scale basis can reduce costs involved in the prevention of repetitive stress related injuries.


Renee Griffith, Chief Technical Officer and Founder of Zephyr-TEC Corp., has established a national reputation in her field and has been a featured speaker at national and local conventions of organizations involved in ergonomics, safety and risk management. After becoming disabled in 1991 with DeQuervaine’s disease, she utilized speech recognition software to start Zephyr-TEC, now an industry leader specializing in speech recognition training, implementation, integration and support.
Sidebar : Types of Speech Software
Speech recognition software typically performs one or more of these functions: Dictation: Continuous dictation allows the user to input speech directly into various applications, such as Word or WordPerfect, Excel, etc. in a continuous and natural manner. Command and control: Some speech recognition software allows the user to command and control the Windows desktop and environment. This includes moving between applications and documents, dropping menus, controlling applications, and creating shortcuts. Navigation: Navigation is moving through your document to allow editing or formatting. Usually this involves using arrow key movements, such as "up-5" or "move-down-3", to move the cursor around the document or application. Natural language commands allow you to say "go to the fifth paragraph" instead of driving the cursor to the fifth paragraph using navigation commands. Editing: Editing is one of the most productive uses of speech recognition software. Editing commands are used to select text, paragraphs, or pages and cut, copy or paste within the same document, different documents, or even different applications. With natural language commands, there are a variety of ways to say commands. For example, you can say "select previous 12 words" or "bold the previous 12 words," combining multiple actions in one command. Formatting: Formatting is greatly enhanced with speech recognition. The ability to simply say "paragraph-justified" or "underline-line" is considerably faster than using a mouse to select text and then execute the keystrokes to perform the formatting. Again, with natural language commands, editing and formatting are frequently combined with continuous natural commands.

Page1 of 1