Is There a "Speech Industry"?

The Applied Voice Input/Output Society (AVIOS) has been promoting the practical application of speech technology for over two decades. In the early days, the immaturity of the technology and the cost of implementing it offered few opportunities to build a "speech industry." Today, the improved technology and the lower cost of deploying it support many practical applications. Many people find that speech technology forms an important part of their work, implicitly creating participants in a speech industry.

Many of those people wouldn't identify themselves as members of a speech industry. They might consider themselves part of the telephone industry, the healthcare industry, the automotive telematics industry, the industrial-automation industry, or the productivity-software industry. So what defines a speech industry, and what binds it together?

Clearly, those companies and individuals supplying the core speech technologies that drive the industry are part of the speech industry. But there are many more people beyond those obvious participants who are part of the speech industry.

One could be skeptical of this point of view. After all, speech recognition, text-to-speech, speaker verification, and audio search are enabling technologies, not an end in themselves. Perhaps the right view is to look at the technology as part of the whole, like a keyboard is a part of a PC. There is no clearly identifiable group guiding the evolution of the graphical user interface (GUI), so why should the voice user interface (VUI) need such shepherding?

Part of the answer is that the VUI has evolved differently than the GUI. The GUI was essentially defined by example by a few successful products (the Apple Macintosh, Microsoft Windows PCs, and widely used Web browsers). One could argue that the GUI has become over-burdened with time to the point that it needs proponents of change, but its general form - on PCs at least - has become so familiar that users will resist change. (The iPod has proven that there is room for innovation on small devices, but, again, by defining the GUI by example.)

There is more controversy than consistency in today's VUI designs. There are, for example, debates over how much use of "persona" (creating a particular personality for the voice user interface) is appropriate (with critics suggesting in part that over-personification can lead the caller to speak too informally). Many applications go the other direction and simply mimic a touchtone menu with speech, over-structuring the dialog and not taking advantage of the increased flexibility the VUI offers. The most appropriate response to these debates may be that the answer depends on the specific case. Solid research can certainly help, but we are not likely to see one accepted approach that applies universally. And, even in specific cases, the best answer may change over time as speech technology and development tools improve.

There are some excellent books, articles, papers, and courses on VUI design that will get conscientious beginners off to a good start. An increasing number of dialog components encapsulate good practices. Unfortunately, there is no guarantee that all newcomers will be aware of or avail themselves of these assets. As speech recognition increasingly enters the "mainstream," creating the need for many new skilled practitioners, there is a real danger that bad implementations by developers inexperienced with VUI design will give speech technology a bad reputation. Further, the danger of unreliable platforms or applications that can't be maintained over time will give buyers a negative view of the technology. All of us whose livelihood is affected by the progress of the VUI should be concerned about the quality of deployed solutions - these will influence the perception and acceptance of the VUI by customers, opinion-makers, executives that make budget decisions, and investment analysts.

There is indeed a speech industry with a vested interest in good practices in VUI design, effective application delivery, and continued progress in core technology, tools and standards. This magazine is an example of a resource serving that industry. There is also a need for organizations like AVIOS that provide a centralized resource for promoting, finding and developing good practices and improved technology.

Do you consider yourself a part of the speech industry? If you are reading this, you probably should.

William Meisel is the executive director of AVIOS ( www.avios.org ), and president, TMA Associates ( www.tmaa.com ).

