Speech’s Price War

In some ways, the cost of current large vocabulary speech recognition systems is artificial. Decades of research and development have made speech recognition programs among the most complex and expensive computer programs ever devised. However, the cost of reproducing software is trivial. The transition from hardware-dependent to software-only products indicates that speech recognition technology has become, in some sense, totally intellectual property, only a “sequence.” But, of course, the genetic code of an entire human being is expressible as “merely a sequence of nucleotides” (...after millions of years of evolution).

It is curious that a speech recognition system, similar to one sold for about 30 thousand dollars less than six years ago, should now be virtually given away. Yet, such is the case.

In 1991, Kurzweil sold its “medical intelligence” computerized dictation system for radiology or emergency medicine, including the computer and installation, for $20-30k. Training and support were available. Hospitals were relatively wealthy then, and needed abundant and rapid transcription.

Concurrently, Dragon Systems was selling its system with training and support, but without a computer, for about ten thousand dollars.

About five years ago, IBM licensed Dragon’s technology and sold a small version of Dragon (5k words) for about four thousand dollars.

A year or so later, IBM introduced its own speech recognition system. The first iteration being the “speech server series” running on Unix-based, networkable RISC machines. Again, the cost was formidable.

Within two years, Dragon began to drop its prices and Kurzweil followed suit.

In 1992 IBM ported its AIX RISC based system to a PC running on OS/2. The system price, including a proprietary processing card, was about $1k. Dragon and Kurzweil then dropped prices for their Windows products to about $1K including their proprietary sound cards. IBM soon offered its system in a Windows version.

Software Only

With the improvement in CPU power as Pentium systems became common, requirements for special processor cards have become moot. All speech recognition companies now offer software requiring only a standard sound card, such as a 16 bit SoundBlaster, rather than a proprietary board or PCMCIA card.

This has resulted in a further decrease in cost, since standard sound cards are less expensive and already available in many computers, sometimes even included as part of the motherboard.

The resulting situation is that linguistically complex large vocabulary speech recognition systems, which took years of research to develop and ran on computers with complex peripherals, often difficult to install and configure, are now available as software-only packages.

Entry-level price may be further reduced by issuing smaller editions of current products. Smaller products require less resources for execution and thus lower equipment cost for the consumer. Systems with smaller vocabularies may be perfectly adequate for the needs of many people. Five or ten thousand words are still a lot of words!

The fact is that most people use a relatively small vocabulary, both in their work and in their personal communication. Moreover, if one needs to type an occasional word, this is usually not a problem. Furthermore the smaller vocabulary systems are excellent for data entry, accounting, and assorted other tasks at which people spend much of their time. It is also worth remembering that as one gradually switches from project to project over time, one may then add new words to a small vocabulary (5-10k words) speech recognition system. Such a system is constantly adapting not only to one’s voice and environment but also to one’s change in vocabulary. The smaller systems may be useful as starter products to test the waters before investing more money, but may also, by themselves, be sufficient for long term use, in many cases.

The savings in memory and disc space can be substantial when using a small vocabulary compared with a large vocabulary system, say 10k versus 60k words. If one can run on 8 megabytes less of RAM and less hard disc space, one could potentially save a hundred dollars or more. And finally the processor speed is another requirement which dictates the cost of a system. It is notable that some systems may run on slower machines than advertised. Processor speed may or may not affect accuracy, depending on the nature of the particular product. Using a proprietary board with built-in processor may sometimes be a cost-effective alternative to the purchase of a more expensive higher speed computer.

Buy Only What You Need

Recently further impressive price cuts have taken place. Dragon offers a small vocabulary starter kit as separate modules (Dragon Dictate Singles) to reduce cost. These are 10 K word versions which run only with a single application such as Windows Word, Excel, WordPerfect, Lotus Notes etc. Kurzweil offers a starter system which can be downloaded from their web page for trial. IBM also offers a starter system which actually contains its full complement of 22 K words, but without some new features available in the Windows 95 3.0 version. These are actually excellent bargains, which give one most of the power and functionality for dictation of the larger version. All three starter systems are available for under $100 each.

IBM has incorporated a smaller version of their system into the new version of OS2, version 4. The operating system costs approximately $150 and includes the speech recognition system for “free,” as well as many applications. It lacks certain features of the larger product, such as the ability to delay correction, or dictate directly into applications, features which are part of the new Windows version. All above products require discrete dictation except for macros and commands.

Continuous Speech Dictation

Here again the pricing is interesting. IBM has offered a continuous speech dictation system for X-ray reports, using a specialized vocabulary, not suitable for other purposes.

The software was initially priced at about $5k, running on NT, requiring special equipment, training, and support, elevating the total cost to between $10k and $20k including the computer.

Similarly Philips and its partners are now offering continuous speech recognition dictation products with specialized vocabularies; the software alone costs approximately $5k.

As the cost of speech recognition systems has declined, the sales have increased, but they have not by any means gone wild.

What is the consumer really waiting for? Should a system offer low cost? Be installed? Integrated? Highly accurate? Easy to use?

So where do we stand? Cost has certainly fallen. Integration has perhaps been achieved by IBM since its speech recognition system is packaged with the operating system in OS/2 version 4. All systems have less voice training requirement than in the past.

Dragon has achieved a high degree of integration and compatibility with other software programs.

Yet from some viewpoints, we are still not quite there. People certainly prefer to dictate in continuous natural speech, rather than having to remember to pause after each word. We postulate a thirst for continuous speech.

Cost is important. People want cheap speech recognition. Coupled speech synthesis is desirable. There is room for much more integration with the operating system and other software programs.

Ease of use is key

It would be attractive to purchase an inexpensive computer with speech recognition capability already installed and integrated. Nevertheless, slowly and gradually, we are making the long strange trip towards the Nirvana of speech recognition!

Peter Fleming and Robert Andersen, speech recognition consultants, may be reached at aris@world.std.com (617) 923-9356.

Products for the 21st Century Office

New VAD from Nortel
Nortel recently announced the release of Voice-Activated Dialing 2.0, a product which allows residential and single-line business subscribers to call people or companies by saying the name, eliminating the need to retain, search for, or manually dial phone numbers.

“The value this type of service has provided to consumers has just accelerated the need to make telephone features more user-friendly and simple to use,” said Gavin Lee, Director, Advanced Voice Processing, Nortel. “Our leadership in speech recognition and our experience in designing powerful, easy-to-use services is now opening up new business opportunities for telecommunications service providers.”

VAD uses Nortel’s Network Applications Vehicle platform, a multimedia processing platform, designed to deliver a broad portfolio of speech recognition and visual display services.

Contact Nortel’s web-site at http://www.nortel.com

VCS Announces Phonetic VX
Voice Control Systems recently announced the release of Phonetic VX, a major new speech recognition technology that can recognize spoken words from among 10,000 or more words.

Phonetic VX is an extension of the VCS approach to phonetic speech recognition technology, which permits rapid vocabulary development through VCS’ type-in tool, WordBuilder.

Phonetic VX is scalable, with a flexible architecture permitting the speech recognition to be entirely host-based or implemented in a host computer/DSP platform environment. The initial demonstration is in North American English, but the company is collecting speech databases for other languages and the Phonetic VX will be marketed through VCS ‘ existing worldwide network of partners, distributors, system integrators and VARs.

Contact Desmond Pieri at VCS, at (617) 494-0100 or visit the company’s web sites at http://www.voicecontrol.com and http://vpro.com

Dialogic’s TextTalk Released
Dialogic Corporation recently announced the availability of TextTalk, a software-based text-to-speech product for the company’s two and four line voice and network interface telephony boards. TextTalk represents a price-performance breakthrough, enabling telephony application developers to offer leading edge functionality such as unified messaging and text-based voice response (IVR) to the small system market at an attractive price.

“TextTalk is the newest addition to Dialogic’s evolving family of software-based speech technology products focused at small to medium sized applications,” said Douglas August, senior product marketing engineer at Dialogic.” With TextTalk, the traditional barriers to adding speech technology to an application, such as text-to-speech and speech recognition, have been removed. This allows our customers to add more value to their applications in a cost effective manner.”

Using TextTalk, the host computer can read ASCII text over telephone lines, making text-to-speech a useful alternative to access large records - such as a database of names and addresses - that are impractical to record as digitized sound files. Text-to-speech is also an enabling technology for unified messaging, where fax, voice, and e-mail messages are integrated on the user’s desktop.

TextTalk is priced at $200 per channel, and comes in a 2 and 4 channel version. The product is available now through all Dialogic sales offices or distributors and can also be purchased by calling Dialogic direct at 800 755-4444 or 201 993-3030.

For more information, visit the Dialogic web site at http://www.dialogic.com

CTI Platform from Digital
Digital Speech Systems recently released a new multi tasking CTI Voice Processing Platform that bridges the gap between telephone systems and computers. The WIN Series is based on Microsoft Windows NT Operating System. It consists of two parts: a voice messaging system and unified messaging workstation software that runs on a background of networked workstations and communicates with a user via pop-up windows and a graphic user interface (GUI).

The system handles up to 96 ports, 1000 hours of voice storage with up to 65,000 users, supports Dialogic low and high density voice cards and SC-bus. Win Series supports Novell, or Windows NT servers configured with TCP/IP local or wide area network protocols.

Contact Steve Milmore of Digital Equipment Corporation, Maynard, Mass. 01754-2571; or at milmore mail.dec.com.

Speech Systems for Mobile Applications
Speech Systems, Inc. recently released Phonetic Engine 1000, a PC-card that enables mobile computers to run speech-driven applications using Speech Systems’ continuous speech, speaker-independent speech recognition technology. It offers speech recognition as a solution for eyes-free, hands-free data input and retrieval on mobile computers running Microsoft Windows.

Hardware system requirements include an IBM PC or compatible computer with an 486 (or higher) processor, running at 33 MHz (or faster), 8 MB of system memory, and 20 MB of free hard disk space for VoiceMatch. Software requirements include DOS v. 3.3 or higher and Windows 3.1, 3.11.

A PE 1000 run-time system (RTS) that includes the PC1000 PC Card, RTS software, and a noise-canceling microphone, is available for $595. Speech Systems’ VoiceMatch SDK software is $595. A bundled development system which includes both is available for $995.

Contact Speech Systems, Inc., 2511 55th Street, Boulder, CO 80301, or e-mail kathleenm@ipri.com

Verbal Database Access from Linguistic Technologies
The English Wizard from Linguistic Technology, allows end users to access databases using speech. English Wizard software responds to questions phrased in standard English, and gives end users a cost effective way to gain access to needed information.

Contact Linguistic Technology, 119 Russell St. Littleton, MA 01460, or call (800) 245-8200.

Virtual Operator from Registry Magic
A fully customizable intelligent auto attendant was recently introduced by Registry Magic. The modular yet integrated conversational office system includes intelligent auto attendant, voice mail, messenger center, and voice dialing.

Contact Neal Bernstein at (561) 367-0408.

Kurzweil VoicePad for Windows
Kurzweil Applied Intelligence recently announced the release of Kurzweil VoicePad Pro for Windows, release 1.0, which the company bills as the world’s first shrink-wrapped voice-enabled word processing application.

VoicePad Pro enables users to seamlessly integrate voice input with the keyboard and mouse, creating a natural and intuitive approach to document creation. Users can create memos, letters, reports and other documents simply by speaking into their PC.

“The introduction of Kurzweil VoicePad Pro marks the first time that a fully voice-enabled business software application has been introduced at a consumer software price point,” said Mark D. Flanagan, executive vice president of Kurzweil AI, and general manager of the company’s PC applications group. “We are providing users with an easy and cost-effective way to experience the benefits of state-of-the-art voice recognition.”

Kurzweil VoicePad for Windows, Release 1.0 is an expanded, shrink-wrapped version of Kurzweil VoicePad for Windows 95, a shareware product which has been downloaded over 25,000 times since it was introduced by Kurzweil AI in June.

The VoicePad Pro does not require any training and can be used “right-out-of-the-box” with 90 percent or higher recognition accuracy. In addition, the product automatically adapts to the users’ speech and language patterns, boosting ongoing recognition accuracy to 97% or higher.

In addition to creating documents through dictation, users can also format text, navigate through the application menus and dialogs, change the application settings and preview and print documents.

VoicePad Pro fully integrates Kurzweil AI’s discrete dictation capabilities with its state-of-the-art continuous digit recognition, so users can quickly and efficiently enter telephone numbers, street addresses, zip codes, dollar amounts, social security numbers and other numeric data directly into the documents as they create them.

Contact Kurzweil Applied Intelligence, Inc. at (617) 893-5151.

Hands-free Computing from Voice Pilot
Voice Pilot for Windows 95 offers speech-driven desktop tools including calendar module, to do list, address book, memo module, note pad, internet voice chat, and automatic language translator. Distributed by Voice Keyboard, Voice Pilot includes run-time version of IBM’s VoiceType Dictation for Windows 95 version 3.0 and a noice cancelling microphone.

Contact Bob Heinking at (954) 432-0888.

Companies and Suppliers Mentioned

Speech’s Price War

Deepdub Partners with Wonderful

Boost.ai Introduces Adaptive Voice

Krisp Launches Listener-Side Accent Conversion for Meetings, CX and Voice AI Agents

Revmo's Voice AI Rollout Yields 71 Percent Conversion and 99.9 Accuracy Across Donato's 174 Stores