Bold Beginning for A New Speech Recognition System

Recently two completely new continuous speech recognition dictation systems have appeared on the scene, one from Philips, and another from Lernout & Hauspie, as well as new versions of existing systems from IBM and Dragon. Here are some very preliminary impressions of the new Lernout & Hauspie Voice Xpress Plus, for general English. This is a full featured dictation system which permits dictation directly into Microsoft Word as well as WordPad, with a street price of about $99. Voice Xpress Pro is coming soon, allowing dictation, into “virtually all” Windows applications. Voice Xpress, without the Plus, allows dictation into just WordPad for $49.00. Voice Xpress (with or without Plus) uses a large vocabulary of 60,000, with 30,000 given and 30,000 for the user to add. Voice Xpress permits enrollment and correction in ways somewhat similar to the IBM and Dragon systems. Like them, it also allows customization of the recognition vocabulary. Macros for inserting prescribed text into dictation are also available, as they are in the other systems. A 6 minute Quick Tour is available and it is helpful. One hour training is recommended before use, during which the user reads text presented by the program. Then the computer needs to process this material for another hour, similar to IBM and Dragon. The system requirements are Word 97 or Word 95 (7.0), Pentium 166MHz with MMX, Windows 95 or Windows NT 4.0, 48MB RAM, 130MB hard disk space, 16 bit Windows compatible sound system, and a high-quality noise canceling microphone. A 30,000 active word initial vocabulary is expandable by another 30,000 words for a total of a 60,000 word active vocabulary. We tested the system on a 200MHz Pentium with MMX, 48MB of RAM, and an ESS sound system. Voice Xpress advertises the extensive use of natural language commands for formatting and editing. But not all the commands are “natural language.” For example, the phrase “upper case that” to produce capitalization, may be useful and may be learned, but is not the way most people speak. It is similar to the style of commands used by the other systems. Unfortunately, the commands sometimes become mixed up with the dictation, with undesired results. Fortunately one is able to turn off the command mode and enter into a dictation-only mode which improves the situation. Compared with other systems, out of the box accuracy is remarkable. L&H claims about 90% accuracy for some people without training, which represents a significant step towards speaker independent speech recognition. Never-theless, correcting one out of ten words is not very useful for daily work. Training and repeating proscribed text, improves recognition. The accuracy rate after training was moderately good. When correcting the text and training particular words, the system did not seem to catch on very quickly. The most important factor here is how accurate the system is after repeated use. Our initial testing could not determine this - only time will tell. Although built on Lernout & Hauspie’s previous natural language command and control system, as well as Kurzweil’s discrete word dictation system, this is nevertheless a first attempt at a new continuous dictation system. The system seemed a bit buggy at times and crashed occasionally. Additional refinement and smoothing are needed, as one might expect with a first edition. The correction mode does not seem to be quite as easy as it might be. When mistakes are made, one can highlight the word which is misrecognized. One must spell out the entire corrected word either by voice, saying the letters one at a time, or by typing; this is true unless some of the misrecognized word is correct, then one may merely modify the existing misrecognized word by typing or by voice. Voice spelling mode allows only the regular alphabet not the international alphabet. A separate spelling mode exists, but it requires a mouse click to activate, and one must then leave dictation mode. The process of correction was easier with IBM and Dragon. Voice Xpress contains a screen reader, a text to speech synthesizer, which will read text from the screen in a “computer” voice, like the other products. This screen reader worked with Microsoft Word but did not function within WordPad. In its current iteration, Voice Xpress lacks the ability to play back the speaker’s voices or to remind him or her of what they said, in case of misrecognition. The ability to record and play back the speaker’s voice is a highly useful feature which may therefore be incorporated into the system in the future. At times, the speed of recognition of this system can be quite striking, appearing almost instantaneously. An indicator window tells the user whether it is “listening” or “processing” or “hearing noise.” This helps in that the machine performs better when one begins dictating while in the “listening” state. Of course this means that the user should be looking at the screen, which is a distraction which some would prefer to avoid during dictation. All these problems would become less if the speaker’s voice were recorded behind the recognized words for correction purposes. In summary, Lernout & Hauspie’s Voice Xpress marks the bold beginning of a new continuous speech recognition dictation system. Further refinements of the product will no doubt be forthcoming.

Peter Fleming, a speech recognition consultant, may be reached at aris@world.std.com, or ( 617) 923-9356.

Companies and Suppliers Mentioned

Bold Beginning for A New Speech Recognition System

DentScribe Launches DentScribe Perio Charting 3.0

Krisp Launches Voice Translation v3

Treble Technologies and Hugging Face Benchmark ASR Models

Why Better Client Tracking Starts With Better Capture of Spoken Clinical Interactions