During the past few years, we have spent time conducting research on voice biometric technologies. This past year or so, we have had the opportunity to develop production voice biometric applications. Our experiences have been challenging and rewarding, and we have added to our knowledge base along the way.

Here are some of the interesting issues we have come across that we have addressed while collaborating with key stakeholders:

The first item pertains to the broader definition of voice biometrics. Interestingly, when it comes to biometrics, it seems that stakeholders use the terms “voice authentication” and “voice verification” interchangeably. As it turns out, these phrases have very different meanings. Authentication, what people most commonly refer to when they speak of voice biometrics, refers to the process where a single caller’s voice is compared against a set of voice models to determine whether the caller is a part of that set. Verification, on the other hand, refers to the process where a single caller’s voice is compared to a single voice model to determine if the person calling is an exact match to the person who originally enrolled in the system. While the difference between the two definitions seems small, the verification process is the more secure of the two and a bit easier to use. 

Next we found ourselves discussing with stakeholders the notion that voice biometric technology does not replace the authentication process; rather, it makes it more secure. It’s commonly believed that callers can simply dial into an application, say a passphrase, and be authenticated. That is true at a high level, but one key element is missing in this belief. Prior to speaking the passphrase, the caller must provide a piece of identification, such as a password or phone number, to identify a voice model (verification) or set of voice models (authentication) to which his voice will be compared. Without a unique identifier, it is very difficult to perform voice biometric verification or authentication with any reasonable degree of accuracy and within any reasonable amount of time.

Major Differences

We also found ourselves in discussions with stakeholders about the three different types of authentication and verification: text-dependent, text-prompted, and text-independent. The most common form of authentication and verification used in the industry today is text-dependent. This method provides callers with a simple passphrase, such as Please use my voice to unlock my account, that they repeat several times during the enrollment process and then later  speak for authentication or verification purposes. This method is very secure and easy to use.  

The text-prompted method is a bit more complex and used mainly as a supplemental level of security on top of the text-dependent method. For a caller to enroll through this method, she must repeat a passphrase several times. Then, during the authentication or verification process, she is generally asked to speak a random subset of the enrollment passphrase. The purpose of this method is to ensure that the live caller is actively participating in the process.  

The last form of voice biometric verification and authentication is text-independent. Using the text-independent model, a caller’s voice is verified or authenticated as he speaks naturally during a call. Unfortunately, this is the method that most people think of when they think of voice biometrics. I say unfortunately because, at this point in time, it is one of the least-used methods, mainly because of the time it takes to enroll. Enrolling in a text-independent model can involve roughly five minutes of recorded caller speech.   

Many technologies in our society hold great promise, yet also have their share of implementation and public acceptance challenges. Voice biometrics is no exception. That said, voice biometrics is a robust technology that can provide added security and potentially lower costs.  

Key to the successful deployment of this technology in the future will be for vendors and clients alike to understand its capabilities and limitations, to ensure it is used appropriately, incorporating best practices, and to design and deploy customer-friendly applications that consumers will want to use. Most certainly, another critical element for success will be for vendors and clients to have the same understanding of terminologies and definitions related to voice biometrics.

Aaron Fisher is director of speech services at West Interactive, overseeing the design, development, and implementation of speech applications for the company. He can be reached at asfisher@west.com.

