It All Depends

I have always been fascinated by the ability of purely descriptive terms to assume strong positive or negative connotations. In the mid-1980s, I was intrigued when the term "isolated" (as in "isolated-word recognition") acquired negative overtones and was replaced by the less pejorative and more oblique (dare I say more discreet?) word "discrete" as the way to describe systems that require users to pause between words.

More recently, I have been monitoring the connotative temperature surrounding the terms "speaker dependent" and "speaker independent." These two expressions describe whether or not characteristics of an individual's voice are captured and/or represented in a speech system.

Speech Recognition

In speech recognition, "speaker dependent" and "speaker independent" stand in opposition to each other, both in meaning and in emotional impact. Those of you who are familiar with speech recognition technology are, no doubt aware of the use of the term "speaker dependent" to describe speech-recognition technology that requires enrollment. A true speaker-dependent speech-recognition system needs to have at least one spoken sample of every item in its vocabulary from each user. In contrast, "speaker independent" properly describes a system that does not require a speaker to enroll in any way.

The aversion to enrollment and other factors have given the term "speaker dependent" an increasingly negative connotation and blessed "speaker independent" with a strongly positive affect.

This is unfortunate, because good speaker-dependent models can outperform speaker-independent models.

One intriguing by-product of this connotative realignment is that the term "speaker independent" is supplanting the term "speaker adaptive."

Speaker-adaptive systems use information gathered from a speaker to "tune" the system to the voice/speech of that person.

For example, the Dragon NaturallySpeaking, continuous-speech dictation product is a speaker-adaptive recognition system because it requires users to read a pre-defined text prior to using the system. Use of "speaker independent" to characterize superior performance is common in speech recognition.


The situation is quite the opposite for voice ID. A voice ID system - speaker verification, speaker identification, or speaker separation - is inherently speaker dependent. It is essential to have exposure to a person's voice before a system (or a human listener for that matter) can determine the identity of the speaker.

The same is true for all other biometrics: They are all person dependent. There is no way a fingerprint can be identified as belonging to (or not belonging to) a specific individual.

Consequently, "speaker independent" and "person independent" have no real meaning in biometrics. If anything, they might be used as veiled suggestions that a system does not work. In contrast, "speaker dependent" is a strongly positive descriptor that is used to tout some speaker-verification products.

The irony is that the speech recognition and voice ID/biometrics industries are beginning to communicate their messages to the same marketplaces. What is the effect of these conflicting connotations?

Probably, the best answer is that "it all depends..."

Judith Markowitz is president of J. Markowitz Consultants, and can be reached at Northwestern University/Evanston Research Park, 1840 North Oak, Evanston, Ill, 60201, or by e-mail at jmarkowitz@pobox.com.

