Market Analysis: Opportunities in Speech

During the second half of 1997, Wohl Associates spent considerable time looking at various aspects of the speech marketplace, interviewing vendors and users, and forming opinions as to the status of the market and its future directions.

Our researchers:

Interviewed more than 50 vendors, including providers of everything from headsets and microphones to services and software.
Interviewed more than 100 current consumer/users of voice recognition technology and 100 actual and potential corporate users of voice technology, especially voice recognition, to discuss their future plans
Interviewed five experts in various aspects of voice, telephony, and speech processing about the future of the industry. Our experts had been involved in the speech industry an average of 17 years and included industry luminaries Ray Kurzweil (founder of Kurzweil, now with Lernout & Hauspie) and Janet Baker (Dragon Systems).

There is no question that 1997 was an important year for speech processing. It was the first year in which commercial continuous speech recognition products were shown and delivered. Not only did such products appear, but they rapidly created a marketplace with thousands of users, established seductively low price points and appealing (and increasing) feature sets, and competed effectively in the marketplace, each gaining substantial partners. And that's just one small corner of the marketplace.

For example:

In February, Chris Shipley, host of the Demo '98 Conference (a showcase for new products and technologies), staged a speech processing bake-off, with all of these vendors - and others - showing their wares. On stage were Dragon Systems with partners Actioneer and Corel, IBM, Lernout & Hauspie, and Fluent.

Dragon showed its continuous speech voice recognition for dictation, command and control, and document formatting. It also showed, with partner Actioneer, the ability to create spoken notes which become written Action Messages attached to documents or messages. With partner Corel, it discussed a version of voice-enabled Corel PerfectOffice for legal users (a Corel focus) and announced voice-enabled versions of Corel's WordPerfect and PerfectOffice for other markets. Since IBM has voice-enabled Lotus' SmartSuite and both Lotus and L&H voice-enable Microsoft Word, this means that all the popular word processors now support continuous speech recognition input.

IBM (which has a continuous speech recognition product competitive to Corel's), chose instead to show the underlying technology in a more sophisticated application, supporting the ability to make airplane reservations with a domain-specific natural language processing interface layered onto the continuous speech voice recognition. It was possible to make and remake reservations, as in the real world, endlessly changing one's mind, one's itinerary, and one's ticket. IBM now claims to have 80% of the world-wide market for continuous speech recognition.

L&H showed a pot pouri of its varied technologies, including its Voice Xpress command and control product which can format Word documents, together with allusions to its work in the automobile interface market and its forthcoming Coronado Internet search, summarize, and translate engine.

Then came a real surprise. Tiny Fluent showed an emotive sprite (a face that can be tuned to look like it's speaking and showing emotions), connected to a voice output engine that ties facial expressions to speech cadence and tone. It was actually using pre-recorded speech (which still sounds the most natural), but it can output synthetic speech from a Text-to-Speech engine. Faces can be changes (realistically or fantastically) and we've seen other emotive sprites that do speech recognition and respond with appropriate expressions.

What does this flurry of activity mean? Two points are readily apparent.

Our experts agree continuous speech recognition is going mainstream. Two major vendors are shipping, thousands of users are buying. More vendors are coming soon and voice integrators are buying this technology to integrate it into applications from the strictly horizontal (dictation) to a variety of verticals (legal, medical, travel agent, etc., etc.).
Many software vendors want to try speech recognition, enhanced by natural language processing, to see if the more natural interface could open their products to more users. It might also provide an easier route to getting new users trained and on board.

In our study, our experts estimated that voice would become a standard interface in the 2001-2002 time period. However, they see it entering the market sooner, with entry depending on product area. Dictation for word processing and command and control have already entered the market. Dictation to other applications and a voice interface to the computer will likely occur this year.

Voice interfaces to a full range of computer applications (with natural language processing within the applications) and to consumer applications beyond the computer, the so-called information appliance market, are expected in 1999.

Vendors	Dragon Systems	IBM	L&H / Kurzweil
Products	Naturally Speaking Dragon Dictate	Via Voice VoiceType	Text-to-Speech Technology Voice Commands VoiceXpress Clinical Reporter Coronado (Internet Search, On-Line Document Translation)
Target Markets	Consumer VAR OEM Legal (with Corel)	Consumer OEM, Developers VARS Medical	Auto Voice Interface OEM Medical Consumer
Partners	Seagate Technology (35% owner) Corel	Lotus Development	Microsoft (18% owner) SGS Thomson HAL Hitachi

Some users aren't waiting. We interviewed two groups of users for the survey.

Group One included about 100 people who had already tried speech dictation by the summer of 1997. Due to the exigencies of the voice recognition market, this mainly meant consumers (average corporate platforms aren't generally big enough to run these desktop applications) running, at that time, discrete word voice recognition and eagerly awaiting continuous speech. In our sample, these users were overwhelmingly male (common for early users of new computing technology) and were mainly from the SOHO market or were individuals, including students, retirees, and homemakers. We did, however, interview about a dozen users from larger organizations (nearly 15% of our sample).

All but one of our interviewees had used speech recognition, mostly for dictation for word processing. Almost all of our users claimed average of greater than average PC skills. More than half of them trained their voice recognition tool for 30 minutes or more. More than half were at least "somewhat satisfied" with the results and in that group 77% were still using voice recognition. About 25% of the users were dissatisfied and stopped using the products, claiming they were too slow, poor at recognition, or too error prone.

About 46% of the users said they wanted natural or continuous speech, the feature or enhancement most frequently mentioned in the interviews.

Group Two included about 100 people from the IS departments of mainly larger organizations. Our plan was to see whether larger companies were looking at voice recognition, particularly with the advent of continuous speech, and planning to try it. To our surprise (and pleasure), more than 40% of our interviewees indicated that they were currently testing voice recognition technology. This is a very high curiosity rate for a technology that's just beginning to hit the mainstream market.

The study includes information about company plans for incorporating voice technology into future applications.

The Wohl Associates White Paper on Voice and Speech Processing pointed out several interesting insights and several areas that will bear further watching. They include:

The market seems to be breaking into companies who build tools and engines and companies who use their knowledge of specific customer sets, markets, and applications to refine or focus those tools and engines, adding task specific hardware and software. This was true already in the previous generation of smaller vocabulary and discrete word voice recognition products and seems likely to be even more pronounced in this newer and broader market.
As software becomes componentized (and the market is definitely moving in that direction), there seems to be an excellent opportunity for a Java Bean component that VAR's (and others) could use to more easily build custom products. We'd expect vendors with both engine and Java knowledge (IBM jumps to mind immediately, but there are others as well) to be first in this space.
We'd like to further study:

How users plan change as the technology becomes both cheaper and more sophisticated;
How the add-on software market (the phenomenon of building on top of engines we mentioned above) looks in a year;
Whether all of the products in a market sector, e.g., continuous speech voice recognition, battle it our for the same market space or whether they specialize. We believe we're already seeing some specialization but we can't tell whether it's random and accidental or planned. Again, in a year, it will be possible to look and tell much more about how the market will be organized.

Traditionally, voice markets have used the VAR distribution channel. Mainstream continuous speech voice recognition seems to be a shrink-wrapped product, perhaps even something to bundle with base hardware, like an operating system or a utility. Does this indicate that other parts of the speech software market will also steer clear of the VAR channel?

Speech is leaving its ivory tower at last. It will be interesting to see these resulting opportunities mature and to note which firms take advantage of them.

Amy Wohl is president of Wohl Associates and can be reached at Wohl Associates, Royal Plaza, Suite 309, 915 Montgomery Ave., Narbeth, PA 19072. For more information about the Wohl Associates Major Report on Voice & Speech Processing, contact www.wohl.com/voice.htm.

Market Analysis: Opportunities in Speech

DentScribe Launches DentScribe Perio Charting 3.0

Krisp Launches Voice Translation v3

Treble Technologies and Hugging Face Benchmark ASR Models

Why Better Client Tracking Starts With Better Capture of Spoken Clinical Interactions