March 8, 2004
By Judith Markowitz Principal - J. Markowitz, Consultants
Forward Thinking

Pluses and Minuses

These are exciting times for speech processing. The marketplace has begun to smile on speech recognition (ASR) and is looking favorably at speech synthesis (TTS). It’s also making muted, but encouraging noises about speaker authentication (SA).

ASR is more robust in challenging acoustic environments. Good human-factors design is making more robust and natural seeming, the industry is selling solutions rather than technology, and application development is faster and easier. As for TTS, the human-like sounds of concatenative TTS are moving to the fore, parametric TTS is becoming more natural sounding, and all of these systems are slimming down in price, size and resource requirements. SA technology is becoming increasingly robust, it is enhancing accuracy through the incorporation of knowledge verification and other data, and vendors are starting to focus on building solutions.

Where are we? Here are a few plusses and minuses on our industry report card.

Standards
The impact of growing adoption of VoiceXML and SALT is rippling through the industry and brought with it a number of benefits.

Rapid development: On the plus side, rapid-development tools are a terrific starting point for a broad spectrum of developers capable of imbuing their work with intelligence, sensitivity and skill. The SpeechTEK Challenge held last fall proved that these tools can be used to build very good speech-recognition applications in just a few hours. On the downside, the Challenge also demonstrated that it’s possible to build bad applications in the same amount of time. This problem has already begun to land developers in hot water. For example, I was recently asked to serve as an expert witness in a lawsuit filed by an end-user company against the developer of it speech-IVR system.

Solutions
Widespread acceptance of standards underlies the shift from selling technology to selling solutions. Increasingly, those solutions are taking the form of solutions packaged applications for widely used functions, such as auto attendant and password reset. Many companies are adopting this strategy. ScanSoft’s recent acquisition of LocusDialog, the leader in speech-based auto attendants, was part of ScanSoft’s packaged application strategy. Many other companies have either predicated their businesses on packaged applications or added packaged applications to their product lines. Among the latter are Nuance and Intervoice. On the downside, the industry has not yet developed the kind of distribution channel that will ensure deep penetration into markets.

Not fully standardized
VoiceXML and SALT are changing the marketplace for ASR and TTS. The adoption of standards by the speech-processing industry has not yet extended to SA. This means that developers must use proprietary tags in order to incorporate SA into their standards-based ASR and TTS systems.

The VoiceXML Forum may extend the scope of its specification to SA. Fortunately, there are existing biometric standards that could be tied to VoiceXML and SALT. BioAPI is the API standard and ANSI X9.84 Biometric Information Management and Security for Financial Services is a security wrapper – a guide for developing secure biometric systems and keeping those systems secure throughout their life cycle. Both are American National Standards Institute (ANSI) standards and both are on a fast track for approval as International Standards Organization (ISO) standards. Like VoiceXML, acceptance of both BioAPI and ANSI X9.84 is growing in the biometric industry. By linking speech-processing standards with existing biometric standards the industry would shorten the time and effort needed to create a standard and would open all of speech processing to developers in the biometric industry. The downside is that neither the VoiceXML or SALT Forum have decided to extend their specification to SA.

Testing and evaluation
On the plus side, ASR has reached a point where performance testing of technology is secondary to operational testing of applications. On the downside, TTS and SA are still at a stage where performance testing and evaluation can provide some utility. Dr. Caroline Henton’s hour-long stress tests of TTS systems (run at several past SpeechTEK conferences) demonstrate the need for formal testing that can highlight strengths, weaknesses and areas of rapid advancement in TTS. Like TTS, SA could benefit from formal testing under controlled conditions done by knowledgeable laboratories. The absence of performance testing of SA systems permits perpetuation of the belief (widespread in the biometrics industry) that SA is not as reliable as other biometrics. The results of one test by the UK’s physical laboratory contradicted that misguided belief but that test only included one SA system. Furthermore, end users are clamoring for performance testing of biometrics because they want to understand whether and how to use biometric-based security.

Tallying the results
The speech processing industry has labored long and hard to earn the attention it is starting to receiving from the marketplace. The final tally is not yet in but it is clearly a positive one – but we still have a lot of work ahead.

Dr. Judith Markowitz is the associate editor of Speech Technology Magazine and is a leading independent analyst in the speech technology and voice biometric fields. She can be reached at (773) 769-9243 or jmarkowitz@pobox.com.

Pluses and Minuses

Aircall Acquires Vogent

Grok Voice Mode Comes to Apple CarPlay

Krisp Launches VIVA 2.0, an Infrastructure for Voice AI Agents

DomoAI Launches TTS and Integrates OpenAI's GPT Image 2.0 in Talking Avatar Workflow