Speech Technology Magazine


Gaining Acceptance Through Impressive Results

There are many reasons why speech recognition (ASR) and speech synthesis (TTS) are moving strongly into the mainstream. While they are not perfect, the quality of these technologies is impressive.
By Judith Markowitz - Posted Nov 7, 2005
Page1 of 1
Bookmark and Share

There are many reasons why speech recognition (ASR) and speech synthesis (TTS) are moving strongly into the mainstream. While they are not perfect, the quality of these technologies is impressive. There are excellent tools, reusable code, packaged solutions, and other developer's resources that have helped to lower prices and shorten time-to-market. Powerful speech servers and platforms provide robust resources and environments for building and deploying that support large, complex applications and also make it possible for them to grow even larger. The incorporation of human factors has produced applications that are much more usable and, therefore, more acceptable to a broader spectrum of users. The list goes on. 

All these developments have contributed to the "mainstreaming" of ASR and TTS. Yet, whenever I'm asked to identify the most important contributor to the strong and growing acceptance of ASR and TTS, I give them only a fleeting thought. Then, the first thing that I say is "standards" - in particular, the speech industry's acceptance of the VoiceXML standard and other standards, such as SALT, CCXML, and MRCP.

Acceptance is the key. For standards, it's not a case of "if you build it they will come." Simply having available standards does not mean anyone will use them.  We might have high-quality, usable technology today even if the speech industry had not embraced VoiceXML. But, instead of standards-based tools and platforms we would have proprietary environments and it's unlikely that the speech-integrator industry segment would be as vigorous as it is with standards.

VoiceXML bioSIG
There is work being done right now within the VoiceXML Forum (and, hopefully, by the time this column appears, in the W3C as well) on a VoiceXML extension for speaker identification and verification (SIV). It's the result of active campaigning by some vendors and developers - and me, accompanied by increased market interest.

Earlier this year, the VoiceXML Forum supported the formation of an informal biometrics special-interest group: the bioSIG. The bioSIG's mission was to develop requirements for including SIV in VoiceXML 3.0 and began a schedule of weekly conference calls co-chaired by Ken Rehor of Vocalocity and myself.

The forum's initial goal was to determine how much interest there is in having a SIV extension. It quickly determined that the interest level is high. Representatives from both speech and biometrics companies participated on such a regular basis that the forum quickly changed the bioSIG from an informal to an official VoiceXML Forum SIG. In May, Rehor did a presentation on the bioSIG's activities at the W3C meeting in Berlin. In August, we met with the VoiceXML Forum at SpeechTEK and prepared a first draft of SIV requirements for a presentation at the September meeting of the W3C.

The requirements are not limited to basic SIV functions, such as enrollment, verification, and identification. They include considerations related to concurrent SIV and ASR processing, buffering, rollback, and catch elements. There are also requirements designed to ensure that the SIV specification is consistent with other standards, notably MRCP version 2 (which includes SIV) and BioAPI (ANSI/INCITS 358-2002 / ISO 19784-1) the generic biometric API. The document also contains a list of "use cases" illustrating the many ways in which SIV can be deployed, including multi-biometric and other multi-factor implementations.

In 2006, we expect that SIV activities will advance in both the VoiceXML Forum and the W3C. The quality of the resulting specification is tied to its ability to capture the ways in which speaker verification and identification are being used and encode them in a royalty-free specification.

Those goals can only be accomplished if representatives from all segments of the speech industry become involved in the effort. While the specification is being developed you can participate in many ways: direct involvement in the VoiceXML bioSIG and/or W3C SIV work,  submission of new use cases upon which the specification can be built, testing, and feedback on requirements and specification drafts.

My hope is that you will be actively involved in this process and that you will quickly give the SIV extension your ultimate stamp of approval: acceptance.

Judith Markowitz is the technology editor of Speech Technology Magazine and is a leading independent analyst in the speech technology and voice biometric fields. She can be reached at (773) 769-9243 or jmarkowitz@pobox.com .

Page1 of 1