In a changing field, speech recognition dictation continues to be altered in radical ways. The purchase of Dragon and Dictaphone by L&H made a small world even smaller. There were previously four players in the world of large vocabulary speech recognition. Now there are three. Philips has built a recognition engine, and has focused on developing business partnerships with possible allies who can employ that engine. In fact, the new version of the general English dictation product from Philips has a reputation for great accuracy. That leaves two more major competitive players in the United States, L&H and IBM. IBM's new Millennium Edition is superb in its recognition accuracy and configuration. It has command and control and macro capabilities, and allows dictation of spelling letters into the correction window. In fact the correction process has been streamlined. It seems to prevent corruption of the user's voice file by carefully processing updated user information before adding it to the user's voice file, and even allowing the option of not changing a well-functioning voice file. IBM's speech synthesizer/screen reader/text-to-speech component is among the best. IBM also seems to run well on slightly slower older machines with less memory than its rivals, making it of potential interest to businesses wanting to save money. Integration of IBM with Microsoft Windows and other Windows products seems imperfect, (which, of course, may not be IBM's fault). IBM also offers versions for Macintosh and Linux. L&H may do the same in the future. The IBM dictation product for the Macintosh has a reputation for high accuracy. Moreover, it can be combined with command-and- control features through a product from a separate vendor called MacSpeak, which apparently uses the Philips' speech recognition engine to good effect, and which is said to integrate well with the IBM system for the Macintosh. Mirabile dictu! L&H at present continues to offer Dragon as a separate product line, but eventually one imagines they might converge on a single product line. New companies exploring an entrance into the field of large vocabulary dictation speech recognition face obstacles because of the years and years of work that is needed to create such very complex programs. Large vocabulary speech recognition is one of the most computationally intensive applications in all of computer science. Moreover, these are the only commercially available large vocabulary systems, of which we are aware, for the whole world. Microsoft was, in the past, rumored to have been working on its own large vocabulary speech recognition dictation system. Whether such work will bear fruit remains to be seen. The investment of some $40 million in L&H by Microsoft a few years ago raises the question of whether they might instead have been thinking of trying to incorporate L&H's dictation system into their operating system or software product line. L&H has had the advantage of integrating more tightly with the Microsoft operating system and software products than IBM, Philips or Dragon. Microsoft's previous investment in L&H no doubt made integration easier. Whether future integration will remain so tight remains to be seen. Whether the changes in competitive pressure will result in less innovation is unclear. L&H acquired Kurzweil's speech recognition system, and made it their own. To merge the work of Kurzweil and Dragon is certainly an interesting proposition. It does not naturally follow that merging products makes a better product. It could make a worse program. And one could lose good features. Or one might truly make a better product. There is gradual progress in this field, with time, toward more and more accurate recognition and more user-friendly programs. The future also lies in the use of speech recognition on small hand-held devices. These include not only recorders, but palm-top type PDAs, to which one can dictate and which can be corrected on the spot. No doubt IBM, L&H and Philips - and their partners - will work to make speech recognition dictation products smaller, lighter and more mobile. When the entire unit fits into the pocket, then speech recognition devices will probably become more widely used by those who need or wish to write. Speech recognition continues to evolve in exciting and unexpected ways. Stay tuned here for new developments.
Peter Fleming, a speech recognition consultant, can be reached by telephone at 617-923-9356 or by e-mail at aris@world.std.com .
