Speaker Authentication: Exploding Some of the Myths

Article Featured Image

Speaker authentication (SA) is a biometric technology that ensures that people are who they claim to be. Like automatic speech recognition (ASR), it is a computerization of a universal human ability—a skill that enables us to establish social and personal bonds and to distinguish friends from foes.

Because it resembles human behavior, SA is perceived as less alien than some other biometric technologies, such as iris scanning. Because it is an everyday skill, SA is less threatening than biometrics that are associated with criminals, like fingerprinting. But familiarity with the human behavior that spawned automated SA doesn’t eliminate basic questions about what it is and how it works. Here are four of the most frequently-asked questions:

Why can’t I just use speech recognition? This question is a natural byproduct of the human ability to simultaneously recognize the speaker and that person’s speech. Also, confusion arises from the fact that the term voice recognition is often used as a synonym for ASR.

Unlike humans, ASR and SA are specialists. The job of ASR is to decipher what a person is saying. It is not designed to look at who is speaking. Most ASR is speaker-independent, which means that it contains as little speaker-specific information as possible so that it can recognize the speech of a broad spectrum of speakers. This means that when ASR technology is used to authenticate users it is strictly a spoken variant of typed authentication and contains no biometric capabilities. The differences between ASR and SA make them good partners. That’s why they are often used together in applications.

Can it handle mimics? It is much easier for a professional mimic to fool other people than an SA system. The reason is that mimics can
imitate a speaker’s style and may modify their voices to approximate another person’s pitch patterns (that’s what fools us), but they can’t change their physiology or anatomy. Even though SA systems examine some aspects of style (which makes it harder for identical twins to fool SA), most of the information processed by SA is physiological and anatomical, specifically the size and shape of the person’s vocal tract (the throat, mouth, and nose).

What about tape recorders? Replay attacks (also called tape attacks) occur when someone records the voice of a person authorized to use an SA system and then tries to fool the system by playing that recording into the phone. The main way to resist replay attacks is called liveness testing. The most common form of this is challenge response, also called text prompting. Challenge-response asks the person to respond to randomly generated requests. Sometimes it will involve repeating a sequence of words or digits. For example, 24, 75 or yellow, green, blue. The randomness of the selections makes it very difficult to use a tape recorder to select them and play them back quickly and naturally. Some systems use a knowledge question, such as What street did you live on when you were 5? Some systems ask for random patterns or pose questions that the authorized user never said to the system before, such as What is today’s date?

Another commonly used technique assesses whether the new input is suspiciously similar to the stored voice model for a user. If so, the SA may send an alarm to the application. This approach is generally used by systems that use spoken passwords or a universal password/phrase shared by all authorized uses, such as verification by bank ABC.

Will it know me if I have a cold?
Most of the analysis done by SA reflects the size and shape of the speakers’ vocal tracts, which doesn’t change much when they have a cold or the flu. Like ASR, however, SA relies on having rich acoustic data—lots of voiced sounds, such as m, l, z, or b. When those sounds are reduced due to illness there is less information available for analysis. If the cold is severe and there is laryngitis, theaccuracy of SA—and ASR and human listeners—is likely to be affected.

If you have more questions about SA, feel free to email me, visit www.jmarkowitz.com, or ask for my white paper on myths and misunderstandings about speaker authentication.


Judith Markowitz is the technology editor of Speech Technology magazine and is an independent analyst in the speech technology and voice biometric fields. She can be reached at (773) 769-9243 or jmarkowitz@pobox.com.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues