Dragon Slays Robot
In his book Only the Paranoid Survive, Intels Andy Grove uses the term strategic inflection point to describe what liberal arts people call a paradigm shift - (basically, a point in time where a business can go one of two ways.) While the choice at the time may appear to be between two very similar paths, choosing the right one leads to great success, while the wrong choice can be catastrophic. For example, the development of the Internet and E-commerce confronts many marketers with a strategic inflection point.
Shakespeare was more eloquent, if less politically correct, when he wrote:
There is a tide in the affairs of men,
Which, taken at the flood, leads on to fortune,
Omitted, all the voyage of their life
Is bound in shallows and miseries.
(Julius Caesar, Act 4, Scene 3)
If speech technology can be said to have had such a moment, it was in April, 1997, when Dragon Systems unveiled the first general purpose large vocabulary continuous speech recognition product, Dragons NaturallySpeaking. Quite soon thereafter, IBM rolled out its ViaVoice product, and since then both Lernout and Hauspie and Philips have joined the fray.
Prior to that, speech dictation products had to make tradeoffs of one sort or another, regarding vocabulary size and the sound of the voice. Speech products either had small vocabularies or required the users to speak discretely, with ... pauses ... between ... each ... word, a process many users found cumbersome, and made others feel they were talking like robots.
Long time industry insiders realize that continuous dictation is not the only aspect of speech, but it is clearly the segment of the market that is reaching into retail stores, and making the greatest impact with the general public. When they think of speech recognition, they think of talking to computers.
Dragon was founded in 1982 by speech industry pioneers Drs. Jim and Janet Baker, and has a history of technological breakthroughs even before the NaturallySpeaking rollout. Dragon Systems introduced the first commercially available software-only dictation systems to support Windows applications in 1993; the first commercially available large vocabulary general purpose dictation system (with discrete speech) for PCs in 1990; and the first speech recognition capability for a portable PC in 1984.
Dr. Jim Baker, developed the original Dragon speech recognition systems while at Carnegie-Mellon University, under the auspices of the governments Advanced Research Projects Agency (ARPA) Speech Understanding Research Program. He introduced the efficacy and power of stochastic processing techniques and Hidden Markov Models to the field of speech recognition.
Dr. Janet Baker formulates and develops new business initiatives and manages strategic relationships. She focuses core technology development on short to long term business opportunities and collaborates with Dragon partners world wide. Well known as an industry spokesperson, Janet recently took the time to answer some questions from Speech Technology magazine.
Is speech now at a point where it can be called a mainstream product?
It is an emerging mainstream product, in its initial stages. Most people still dont have copies, so speech is not mainstream yet. But it is emerging very fast, starting with the release of large vocabulary, continuous speech, with NaturallySpeaking. The type of growth we are seeing now could not have happened solely with products geared to specific industries.
It is emerging as a mainstream product because it is the closest analogy to how people speak to one another. There are certainly many cases where text is necessary, but most of our communication is with speech. If we did communicate differently, we would do it much less.
What does the future hold for speech?
Speech is going to have broader and broader penetration at a variety of levels. It has taken off on the desktop initially. But you will see speech integrated into many information appliances, hand held computers and wireless configurations. It will become embedded in the solutions and devices.
There will always be ways for the technology to improve, but there are no longer technological barriers to speech becoming mainstream. People are going to find speech to be more productive and it will make their lives more convenient.
What vertical markets make the most sense for speech?
Medical and legal are the largest, but there are many verticals where speech can be used effectively. As people become more specialized, the need to integrate their applications with other applications in a significant way becomes very valuable. The radiology department needs to be able to integrate their records with the hospital billing department, for example. Pharmaceut-icals is another area where speech is an advantage. Being able to deal with prescriptions that are computer written has a clear advantage over relying on the druggist to read the doctors handwriting.
Hospitals and doctors can pay bills and insurers more rapidly. You have similar arguments within legal and other specialized areas where people are involved in critical hands on tasks and still need access to databases of information. Workflow software is a more and more important area for many businesses.
Speech can be an important input modality for these solutions.
How does an industry which produces a product that ultimately is transparent to the end user make an impression on that user? Can users really distinguish between accuracy rates of 95% or 97%? Are end users actually interested in quality?
In principal, I agree that it is possible to take speech for granted because a speech interface is not always visible.
But in surveys we have conducted, as well as those done by other companies, accuracy has emerged as the number one concern for users. The 95% or 97% distinction sounds narrow only because you are in the 90s. If you look at it from the other side, one system with 5% errors and another with 3%, you are talking about the first system making almost twice as many errors as the second. People notice that.
Beyond accuracy, the overall quality of the product definitely matters. Right now, with NaturallySpeaking, our Preferred edition is outselling our Standard model, although the accuracy rate for both products is about the same. I understand that something similar is happening with ViaVoice. People really care about quality.
What advantages does speech have compared with other interfaces? Or is it better to think of speech as complementary?
I definitely think of speech as complementary to the others. We do have users who use speech exclusively, but it has a significant value when you can mix it, and we design systems that can be mixed with other technologies.
Different people do their work differently. There is not just one way of creating data and text, and we want to give users a great deal of flexibility.
What are some of the primary factors driving the speech market right now?
The reality as far as I can see is the breakthrough in 1997 with Naturally-Speaking. When that became available it spawned a lot of applications that were not practical before then. It was a critical breakthrough and could not have happened until speech could generate arbitrary text.
The big market growth has been horizontal. It wouldnt be happening with just the vertical applications alone.