Voice Biometrics - Are You Who You Say You Are?

Not very long ago, an article on speaker authentication or voice-based bio metrics that appeared in Speech Technology Magazine would have had to include a great deal of introductory information. Today, not only are many in our industry well-aware of speaker authentication, a growing number of companies have incorporated speaker authentication and, in some cases, speaker identification into their offerings. This increase in awareness is, in part, a by-product of 9-11 – although during the last few years of the 20th century the market for speaker authentication proceeded up a gradual slope of hype, familiarity, and interest like many other emerging technologies. Following the 9-11 attacks speaker authentication seemed to burst onto the scene along with other biometrics, such as face recognition. From that point forward biometrics remained in the public eye. MARKETS AND APPLICATIONS Familiarity does not translate into immediate market acceptance. In fact, when one looks at the marketplace (outside of intelli gence and law enforcement), the adoption path is probably not strikingly different from what it would have been had 9-11 not occurred. That is, non-government deployment of speaker authentication and other biometrics still parallel patterns of other emerging technologies. The corrections industry remains the largest established market for speaker authentication (outside of law enforcement and intelligence). That market includes all organizations that are charged with authenticating, monitoring and tracking of criminal offenders, pre-trial defendants, and others in the justice system. Those individuals may be incarcerated in a jail or prison, or they may be involved in a communityrelease program (e.g., parole, probation). Although the inmate population is large, most products for the corrections market are designed to support community corrections programs. These tools verify that offenders are abiding by court-ordered sentencing regarding restrictions on their movements outside of their homes. For example, a juvenile offender may be allowed to attend school during specified hours but then is required to remain at home the rest of the day. An adult offender may be required to be at a given work location from 8 a.m. until 5 p.m., attend AA meetings from 6-8 p.m. on Tuesdays and Thursdays, and then return home. These systems may be configured to call offenders at random times throughout the day and/or to receive calls from offenders. Typically, they are designed to operate only over land-line telephones and virtually all of them have incorporated Internet-based access for supervising officers and other authorized individuals. Password reset is a fast-developing commercial market for speaker authentication. Because it is an employee-facing application, it allows the deploying organization to test and evaluate the technology and assess its benefits prior to moving to customer-facing deployments. Technologically, password reset is fairly simple so that the impact of the implementation on the organization’s computing infrastructure can be limited and deployment time can be relatively short. Password reset can also produce a fairly rapid return on investment (ROI). All of these things are not only useful for evaluation of speaker authentication but they are also valuable for building a business case for using the technology in other kinds of applications. Speaker authentication is also starting to move into call centers. As with speech recognition, call-center deployments have the potential for being a huge market. Call centers are under increasing pressure to automate as a way of reducing cost, attenuating the impact (and cost) of agent turnover, and providing services 24/7. Usually speaker authentication is partnered with speech recognition for customer-facing and partner-facing applications. Most often, speaker authentication is added to existing speech-recognition applications but it is an increasingly popular feature of new deployments as well. Some call-center applications extend the definition of speaker authentication . For most applications, speaker authentication is synonymous with speaker verification: a one-to-one comparison of the voiceprint of the caller with the system’s stored voiceprint for the identity the caller is claiming to have. In the call-center arena there are many applications where more than one person is authorized to access information or engage in secured activities (e.g., joint accounts). When those people share a password, which may occur when the password is an account number, the system needs to compare the caller’s voiceprint with stored voiceprints for all of the authorized individuals. When the system only needs to determine whether the speaker belongs to the group of authorized speakers the process is called speaker classification. When the identity of the group member needs to be determined the process is called speaker identification. In either case it entails one-to-many matching. There has also been a change in the way speaker authentication is marketed. The selling points have moved away from fraud reduction and pure security. Now, both vendors and their customers are talking about ease-of-use and ROI. This is an indication that the market is starting to believe that the technology works which means the focus can be shifted to standard business-case issues. This change is also an indication that the market is beginning to mature. COMPANIES AND TECHNOLOGY There is increasing differentiation among the companies that are stepping up to provide speaker authentication products and services. The number of active integrators - that is, companies who are actively involved in deploying speaker verification like OTG, Vocent, Voice.Trust - is still much smaller than the number of engine developers. Although many engine developers are still pure-play voice-biometrics suppliers, a number of speech-processing companies have entered the industry with technology that is designed to operate in conjunction with speech recognition. There are also more companies with headquarters outside of the United States. The platforms they deploy on include chips and boards, laptops and desktops, and servers. The technology that is deployed on those platforms now includes text-independent authentication and identification as well as text-dependent and text-prompted technology. Each of these has its own strengths and weaknesses, of course, but the commercialization of textindependent technology extends the options available to customers. A number of patents were awarded in 2003 that address issues as disparate as aging of voiceprints, new compression techniques, scoring based on content plus acoustics, verification of the consistency of enrollment samples and audio-visual fusion. CHALLENGES The challenges facing the industry are both technological and market-based. Voice biometrics still needs to improve its handling of noice, far-field input and channel/device mismatch. Since these are also challenges that speech recognition faces, advances made on behalf of speech recognition in these related areas also benefit voice biometrics. On the market side, widespread acceptance requires addressing and overcoming implementation and acceptance issues as diverse as privacy, voice print size and cost. These and other factors are all part of an unfolding story that will continue for a number of years.

Dr. Judith Markowitz is the associate editor of Speech Technology Magazine and is a leading independent analyst in the speech technology and voice biometric fields. She can be reached at (773) 769-9243 or jmarkowitz@pobox.com.
SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues