Speech Technology Magazine

 

The Meaning Should Be Clear When Choosing the Words

When you examine the individual words that make up the phrase you are likely to decide that it refers to the ability to hear a voice and say, "That sounds like Susan!" When machines can do this it is called speaker identification or speaker recognition. These phrases focus not on the person but what is being examined and recognized is, indeed, that person's voice.
By Judith Markowitz - Posted Nov 30, 2000
Page1 of 1
Bookmark and Share

WHAT IS VOICE RECOGNITION?
[IMGCAP(1)] When you examine the individual words that make up the phrase you are likely to decide that it refers to the ability to hear a voice and say, "That sounds like Susan!" When machines can do this it is called speaker identification or speaker recognition. These phrases focus not on the person but what is being examined and recognized is, indeed, that person's voice. Strangely enough the term voice recognition has actually become such a widely used synonym for speech recognition that it is becoming the preferred name for the industry itself. This linguistic phenomenon is a bit strange because speech recognition recognizes words rather than voices, but since voices utter words one might easily imagine how the shift might have occurred. What is really perplexing about all of this is that for as long as I can remember the industry has been bent on making speech-recognition technology completely speaker (that is, voice) independent. In view of this, blithe acceptance of voice recognition by speech-recognition professionals is what an anthropologist would call cognitive dissonance. As the cognitive dissonance spreads and deepens we may create enough angst for at least one scene in a Woody Allen movie. Who knows where that could lead? The most disturbing thing about the use of voice recognition to mean speech recognition is that it is misleading and confusing. It encourages industry outsiders to believe that speech recognition and speaker verification are the same thing. I have encountered a distressing number of people whose investigations of speaker-verification have led them to NaturallySpeaking and Via Voice. It is not surprising that many of them also report that speaker- verification technology does not work. It is my contention that every speech-recognition professional who insists on using the term voice recognition has a responsibility to make it clear that speech recognition is NOT a voice-based biometric to prevent customers from using speech recognition to protect their computer systems and other valuables. One method of presenting the differences between the roles of speech recognition and speaker verification in a manner that people are likely to remember is to use an analogy. I like analogies because the process of transporting a set of relationships from one domain to another, quite different realm can force you to re-examine the concepts involved. Besides, they can be fun. Here is how a car analogy might be used: Speech recognition is like the steering wheel, gas pedal and brake because they are the tools that you use to get where you want to go. You can't use speech recognition (or voice recognition) to secure your application any more than you can use your steering wheel to secure your car. In fact, speech recognition, like the steering wheel, is an invitation to drive. Speaker verification is like the key that unlocks your car so that you get in and drive. You can also think of speaker verification as being like putting The Club on your steering wheel. You can't use speaker verification to navigate through an application any more than you can use The Club to drive your car. This analogy can be extended to include text-to-speech synthesis. You could say that speech synthesis can be likened to the odometer and other gauges of the car that tell you what the application/car is doing. Those of you who are concerned about human factors could also point out that speech synthesis is like the paint color, size and other form factors of the car because those are the things that cause people to love or hate the entire package. For example, a friendly, perky voice could be likened to a pink Cadillac. Some people are delighted and charmed while others reject the combination as annoying/garish and would certainly never consider it for business. This is just one example of how analogies can clarify distinctions and help us to see the familiar in new ways.


Judith Markowitz is the associate editor for Speech Technology Magazine and is a leading independent analyst in the speech technology and voice biometric fields. She recently completed a market analysis of speaker verification and identification. She can be reached at (773) 769-9243 or Jmarkowitz@pobox.com

Page1 of 1