Q&A: Jeffrey Hopper, vice president of client services at Lumenvox, on Voice Biometrics

Article Featured Image

Voice biometrics have been in use for years, but their capabilities are still limited. James Larson, co-chair of the SpeechTEK conference, talked to Jeffrey Hopper, vice president of client services at Lumenvox, about these limitations and how they can be overcome, which will be part of his SpeechTEK 2020 presentation.

Why must users register for them to be recognized by biometric software?

Every biometric system requires a template/enrollment to match the new samples against. Typically, there are two approaches to enrollment:

1) Active: This is text-dependent and requires the engagement of the user. He must take active, explicit steps required for the enrollment process (often this is considered onerous, as it requires a specific set of prompts and responses, and, therefore, time by the user).

2)Passive: This is not text-dependent; instead, it is accomplished by either live voice capture during a call or from pre-existing call recordings during previous interactions. This passive process requires less effort from users and can significantly improve enrollment rates.

Can a twin fool a biometrics speech recognition system into recognizing him as the other twin?

Yes! Nature has its own method of beating DNA tests. Even birth mothers can't always distinguish between biological twins. Biometrics can't claim to know more than mother or nature. The truth is all biometrics systems have limitations, whether the modality is voice, face-print, fingerprint, or something else. That being said, twins are not a typical attack vector in most contact centers or self-service environments. The real risk arises from any group of users who make false identity claims to identify and obtain an advantage against a business. Twins generally aren't going to try to circumvent these measures. The real number of identical twins collaborating to commit fraud is likely very small compared to the overall fraud actor population.

Can a biometrics recognition system be fooled by a recorded voice?

The short answer is yes, if there are no proper countermeasures in the system. With countermeasures, there is less risk and more stable protection. Countermeasures include prompting for real-time responses so responses can be randomized and are difficult, if not impossible, to pre-record; or sending one-time PINs to known phone numbers or email addresses. These algorithms can detect reused samples or artifacts that are introduced by playing back a recording. However, a perfect, unseen recording that is directly injected into the audio channel is impossible to detect. All technology has limitations, including biometrics. iPhone's fingerprint/face recognition was hacked the first day it came out. The truth is that all biometrics security involves balancing risk with usability.

Is it possible to continuously apply biometrics to verify that the original speaker is not replaced by a second speaker?

Absolutely, especially in passive conversations. This is typically applied by reauthenticating the speaker periodically during the call at predetermined intervals and presenting confirmation to the contact center in their agent desktop at the same intervals.

What should happen if a speaker is not recognized because he has a cold, sore throat, or laryngitis?

There should always be a fall-back method. With fingerprinting, the issue may arise that the customer has scratched her finger and cannot use fingerprint anymore. 2D face recognition won't work in the dark. Although there are algorithms that cope with degradation in the voice, there is a limit. Some additional authentication methods, such as a one-time PIN via email or SMS or knowledge-based questions, should be available for times when the caller's voice is greatly impaired.

Does biometrics software use much computer power?

This depends on the use case. Generally, it's a set of complex signal-processing and machine-learning tasks, but single authentications can be handled quickly and aren't particularly computationally intensive. Some tasks, such as batch enrollments from call recordings or off-line comparisons for fraud identification, might require higher-resource servers. Running continuous verifications of hundreds of concurrent callers in a call center environment requires properly sized servers, but nothing too burdensome.

To see presentations by Jeffrey Hopper and other speech technology experts, register to attend SpeehTEK 2020.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues