December 1, 2003
Q & A

Larry Heck, Vice President of R&D, Nuance Communications

NewsBlast
What exactly are biometrics and voice templates?

Dr. Larry Heck A biometric is defined as the automated verification or identification of a person based on physiological or behavioral characteristics. Verification of a person is based on something that the person has (ID card, badge), something that the person knows (password, PIN, personal information) and/or something that the person 'is': it is this last method of verification that a biometric is involved. One of the better- known biometrics is a fingerprint. The voice biometric is a measurement from the voice signal that conveys the physical characteristics of the person's vocal tract as well as the behavioral idiosyncrasies of how a person speaks (e.g., word choice, prosodic patterns, etc.). The "voice template" is numerical data that consists of the most salient biometric measurements of a person's individual voice. The voice template is typically constructed in an explicit enrollment session, and stored in a database for subsequent comparisons with voice patterns during verification attempts. The voice biometric is more secure than passwords and PINs, which can be stolen, guessed and forgotten. And it has advantages over the other biometrics because it is unobtrusive and ubiquitous - voice verification can be performed over existing telephones with no requirement for special equipment (compared to fingerprint systems that require specialized scanning devices to be installed at each point-of-contact).

NB
What do you see as past and present barriers or factors affecting the accuracy of voice authentication technology?

LH The ubiquity that the telephone network provides for the data collection and transport of the voice signal also poses significant challenges to voice authentication systems. For the technology to work on the telephone network, the technology providers give up control over which device (e.g., telephone handset) or communication channel (landline, wireless) was used during the verification attempt. As a result, handset and channel variability is often introduced between the enrollment and verification sessions. This variability is called handset/channel mismatch, and introduces "noise" into the voice signal that must be ignored by the speaker verifier. Not controlling for this variability can result in a 10-20x increase in error rates of modern voice authentication systems (as compared to situations where the user enrolls and verifies on the "matched handset/channel" types). During the past 10 years, there has been considerable progress on reducing the impact of handset/channel mismatch. In the mid-90s, research labs including SRI International and MIT Lincoln Laboratory developed technologies that partially overcame the mismatch hurdle. Subsequently, Nuance created automated learning technologies that essentially mitigated the mismatch problem. The technology listens in on each verification session and automatically learns how to compensate for the channel/handset mismatch without the need for human intervention. Another barrier studied by SRI in the late 1990s was the inability of modern voice authentication systems to utilize idiosyncratic usage patterns. Often, a person will speak in a specific manner, using specific words and/or specific prosodic patterns (e.g., speaking rate, expressive vs. monotone speech, pitch patterns). Research in this area (currently at a large number of research laboratories) is leading to new technologies that collectively represent a major breakthrough in the field of voice authentication and identification. The inclusion of these "higher-level" features of the voice biometric has led to 5-10x reductions in error rates. With the new technologies developed over the past 10 years in the field of voice authentication, systems now exist that can be successfully deployed over standard telephone networks (landline and wireless), retaining the ubiquity the telephone offers while delivering accuracy that is competitive with the other biometric technologies (e.g., face, fingerprints).

NB
Is voice an area of biometrics that is used and supported by government agencies? What other types of biometrics are used by the government?

LH Yes, government agencies are looking at using voice authentication and biometrics in tandem with other methods, including badges, smart cards and passwords. The Department of Defense and National Security Agency (NSA) are currently examining using biometrics, including voice, to authenticate employee access to specific applications. And the Social Security Administration is piloting a voice authentication program. In January 2003, the SSA began testing voice to authenticate users of the W-2 wage-reporting system. Basically, an employee needing to access the system on behalf of an employer applies online. Then, the SSA system sends an e-mail message to the employee's supervisor asking permission. That's when the supervisor is pointed to the SSA Web site, which then calls the supervisor. After he/she reads a few words on the SSA Web page over the phone, the system records and stores the supervisor's voice template for future reference. From then on, each time the employee wants to make a transaction, the system will call the supervisor and verify his or her identity. Also, the government is looking into biometrics for tracking activity at border crossings - fingerprinting specifically. And select airports are using hand geometry, which allows people to "enroll" their hand in the system via a scanner. The scanner records the outline of the hand, its position, lengths of fingers.

NB
We often hear the term "surveillance" in connection with biometrics and voice authentication. Is voice authentication then a monitoring technology?

LH Voice authentication can be used as a monitoring technology. One example: home incarceration systems that use voice authentication. Surveillance in this scenario is not about the classic "listening into calls" or "wiretapping," but instead monitoring a person's whereabouts. One example: Louisiana-based ShadowTrack to track pre- and post-trial offenders in Louisiana and select locations in California, Arizona, Nebraska and Minnesota. In this application, the offender enrolls their voice into the system via phone and the software encrypts the voice template into a form that cannot be hacked or copied. After that, the parole officer can monitor the individual's location/check-in history 24/7. The system calls a person's home at random times, and for each call, the person must (a) pick up before some number of rings, (b) listen to the prompts and repeat a few random phrases into the phone, and (c) get successfully verified by the voice authentication system. Anti-fraud features prevent the offender from using a tape recording of their voice or forwarding their home phone to a mobile phone. And in the UK, there's a program called the "Intensive Supervision and Surveillance Program" that identifies and tracks youth offenders. Voice authentication is being used by the Youth Justice Board and piloted by the National Probation Service to ensure that teens are either at home or a specific, pre-approved location (job, training) at specific times. The appeal of voice authentication for law enforcement is that instead of expensive high-resolution cameras, scanners or ankle bracelets, all that's needed is a voice and a phone.

NB
What types of businesses are using voice authentication? How comfortable are consumers with voice authentication?

LH Many consumer-facing businesses are developing voice authentication applications. In fact, industry analyst firm Celent Communications expects the biometrics market to exceed $1 billion by 2004, with voice authentication making up about 10 to 15 percent of the total market. Banks and financial institutions such as Banco Bradesco are already using voice authentication to verify caller identity before allowing access to account information. Also, since the cost of resetting a customer PIN or password with a live agent can be as much as $12 each time, many companies are using voice authentication for their PIN reset systems, such as when a customer loses an ATM code. There's even Chiffon, a global gambling exchange, that uses verification technology to identify and authenticate exclusive clients among its millionaire customer base. Also, research shows that as consumers grow increasingly familiar with speech, their comfort level and willingness to use the application increases as well. A recent Touchpoint survey queried 600 respondents (only five percent had been exposed to voice authentication in the past). Of those surveyed, the majority of users thought voice authentication was more convenient to use than a PIN, and an overwhelming majority thought voice authentication was equal or more secure than using a PIN.

NB
It's common knowledge that passwords and PINs can be stolen. But how secure is voice authentication technology as a biometric? Can it be hacked? What about imposters?

LH Voice authentication has been studied, tested and analyzed by researchers - and clearly, major breakthroughs in voice authentication have boosted its status within the biometrics field. Voice authentication has been studied for decades, but the majority of advancements have come during the last three decades, with sustained funding of commercial and university labs primarily by the U.S. government. And the last 10 years have seen tremendous breakthroughs in making voice authentication work over different telephone channels. Voice has come a long way: a recent study commissioned by the Communications Electronics Security Group in the UK indicated that voice performed better than tested fingerprint, face, and hand geometry systems for security/safety. A typical voice template requires 10-20 kilobytes of disk space and an application with one million users requires just 10-20 gigabytes of storage. Voice authentication technology can use as little as 0.5 seconds of speech to authenticate a caller - in real-time. And because voice templates do not contain the original speech signal but rather are extracted parameters specific to the underlying algorithms, they cannot be hacked or "reverse engineered" to gain entry to a system.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Larry Heck, Vice President of R&D, Nuance Communications

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API