2024 State of AI in the Speech Technology Industry: Voice Biometrics Both Profits From and Is Plagued by AI

Article Featured Image

Security experts have increasingly relied on various types of biometrics, from fingerprints to facial recognition, from retinal scans to voice biometrics, to positively identify authorized users and thwart fraudsters from trying to gain access to locations, accounts, and computer systems. Sometimes these biometrics are used as the sole identifier; other times they are used with other identifiers as part of a multifactor identification procedure.

And while it has its flaws, voice has long been the preferred biometrics factor because it is much more convenient for the customer.

Nearly everyone has a smartphone with them most of the time, so voice biometrics provide a quick, easy method of identification, says Robert Wakefield-Carl, senior director of innovation architects at TTEC Digital.

And now, with the addition of artificial intelligence, voice biometrics needs fewer data inputs to positively identify a person, says technology and research analyst Jon Arnold. AI also makes it possible for speech technology vendors to add more languages, dialects, colloquialisms, etc., more quickly to their voice biometrics engines. “The capabilities become greater as more AI is layered into it. It can be more accurate and work faster.”

But that is a double-edged sword. With voice biometrics, it has become a battle of good AI vs. evil AI. On the one hand, AI helps voice biometrics perform more securely; on the other, bad actors have used AI to develop deepfakes capable of fooling both humans and voice recognition systems to compromise accounts.

The Year in Review

AI became more of a threat than a solution for biometrics in 2023, according to several experts.

In the past, spoofed voices have been fairly easy to catch. They didn’t sound natural in tone or pacing. But fast-advancing technology, particularly involving generative AI, is making it much harder to separate synthesized voices from the real thing. In fact, a number of new online tools today allow almost anyone to upload a few seconds of a person’s voice, type in what they want it to say, and quickly create a near-humanlike voice clone that can be used in many situations. There have in the past few years been a number of highly publicized cases of fraudsters using deepfakes to transfer large sums of money out of bank accounts or for other scams.

The biggest impact of voice biometrics in 2023 was on the negative side of the ledger due to deepfakes, according to Wakefield-Carl. “AI is being used to circumvent traditional voice biometrics.”

“The pace of innovation is being accelerated, and unless you have the capability of operating continuously and have enough specialized manpower and tools to keep up in this race, the system you will be working with will not be secure,” Victor Gomis, sales executive for security and biometrics at Nuance Communications, told the European Association for Biometrics in late November.

Gomis added that easy access to AI and other technology that can quickly generate deepfakes means that legacy synthetic voice detection methods can no longer meet the challenge.

And it’s a real fear that is affecting a lot of businesses, especially in a lot of high-risk verticals like healthcare, insurance, and financial services.

A recent Daon survey revealed that 62 percent of IT executives in B2B companies were concerned about the security threats posed by AI and deepfakes.

“There’s a trust issue online right now because of these deepfakes,” says Conor White, president of new industries at Daon. “We began to see these deepfake generators cloning everyone from actors and politicians to business executives and even someone’s grandchild.”

The most sophisticated deepfake generators produce voices in real time that not only fool the human ear but most interactive voice response authentication systems, according to White. The most sophisticated hackers combine these hyper-realistic voices with identification credentials (address, phone, etc.) gathered from the internet to attempt to defeat knowledge-based verification processes. Generative AI not only produces realistic-sounding voices; it also has a natural, conversational output.

“It can sound like a Caucasian man, age 22, or a female or whoever and provide accurate answers back to the contact center agent,” White says. “That’s a very sophisticated attack. That’s very hard to guard against. What really scares me is the ability to hyperscale. Up to now, you had to hire more people to do more hacking. But with a computer, you don’t have to do that. You start one new process, then another new one, then another. The computer can make hundreds of calls at the same time. When you add more computers, then you have a tsunami.”

Several companies, including Daon, have spent a lot of time and resources developing technologies to detect machine-generated voices. And now, with the very real threat posed by deepfakes, voice biometrics providers will be focusing on developing and promoting what they say are technologies that can thwart these types of attacks. For companies like Daon, Resemble AI, and ID R&D, this is a central part of their product road map.

Daon, for example, launched xSentinel, an expansion of its AI.X technology, at the end of 2023.

xSentinel uses adaptive synthetic voice protection to create a layer of defense within voice and other communication channels and enhance the identity verification technologies within the Daon IdentityX and TrustX platforms.

xSentinel generates a signal of potential fraud that is more accurate than traditional audible cues, effectively stripping away the advantage that cloned voice generators provide for bad actors, according to Daon’s marketing materials. Seconds after a caller begins speaking, proprietary algorithms detect cues that could indicate a digitally generated voice. xSentinel provides the data necessary for organizations to build reliable synthetic voice protection steps into their contact center processes.

The AI-powered synthetic voice detection built into xSentinel keeps pace with the changing fraud landscape by using machine learning to continually adjust and improve its ability to detect synthetic speech cues. It also offers a real-time signaling system for early in-call detection and is language- and dialect-agnostic.

In addition to xSentinel, Daon offers xVoice, which combines passive and active voice matching algorithms and anti-spoofing technology, including replay, synthetic speech, and distorted and continuous voice detection, to detect fraud.

“As the capabilities of AI are expanding at an incredible rate, we don’t want to say it won’t ever be an issue, but the anti-spoofing technology we use for xVoice is already designed to detect playback of any kind,” Daon says on its website. “Our scientists are also actively working to develop algorithms specifically designed for battling AI.”

Another powerful recent addition on this front is Nuance’s Gatekeeper technology, which also relies on evolving AI technology to protect against deepfakes. Gatekeeper uses deep neural networks to make a single, transparent authentication decision for every engagement. It looks not just at the content of the conversation, including voice characteristics and speaking patterns, but also takes into consideration the device, network, location, and other factors. These checks happen seamlessly in the background of each engagement.

“We conduct regular testing against the latest AI voice cloning and text-to-speech models on the market, and we use a multilayered approach to prevent attacks,” Brett Baranek, vice president and general manager of security and biometrics at Nuance, wrote recently in a company blog post. “We don’t just check to see if an incoming voice sounds close enough to the voice of the customer; we analyze hidden artifacts in the audio that may indicate it’s been digitally manufactured or is being played from a recording. To deliver an authentication verdict, our AI Risk Engine incorporates a variety of other important signals about the caller, the voice, the audio signal, and the call itself.”

ID R&D, a provider of liveness detection and voice biometrics, has quickly become a leader in addressing AI-powered fraud by combining passive facial liveness and voice anti-spoofing technologies.

“True to our name, research is a strategic priority at ID R&D, and our products are built upon investigation and discovery of new ways to reduce fraud without burdening users,” said Alexey Khitrov, CEO and cofounder of ID R&D, in a statement.

“Rapid advancements in artificial intelligence bring new challenges to securing digital onboarding and authentication, particularly from deepfakes, and we expect this trend to persist,” Khitrov said further. “Fortunately, we are also working hard to research and develop new ways to leverage AI to its full potential to counter AI-powered fraud.”

Resemble AI Detect can validate the authenticity of audio data to expose speech deepfakes in real time with up to 98 percent accuracy, according to the company.

“Our mission is to make interactions with digital products as human and natural as possible,” says Zohaib Ahmed, cofounder and CEO of Resemble AI. “We want to make AI tools available to more people, but we also acknowledge that we can’t talk about AI without talking about ethics.”

With all that in mind, experts predict that AI-based voice biometrics will continue to grow during the year. Companies using voice biometrics like the depth that AI provides to the technology, according to Maurice Kroon, founder and CEO of Vox AI. “It’s not just on the surface. In the past, we used to need several minutes of audio to produce a voiceprint. Now there are several companies that provide these in several seconds. We can also get down to the point that it’s a single phrase used by everyone.”

Among the companies considering adding AI-supported speech biometrics is xcoins.com, a cryptocurrency exchange platform. The company is already using know-your-customer and document verification techniques to help secure transactions.

“It seems very logical for us to use speech biometrics since we have customers that make large-transaction purchases. Once we train the AI agent on the [speech] data, it would provide more authentic verification of the transaction call,” says Brian Prince, cofounder and chief marketing officer of xcoins and founder and CEO of TopAITools.com, an AI tool and educational hub providing resources for navigating the AI landscape.

AI-enabled voice biometrics is just another security element because there are so many ways to try to fool different security systems, Prince says. “We want to do everything that we can to reduce fraud. Even if it’s a 1 percent difference, that’s a dramatic amount for our business. It’s just another tool in the toolbox that we can utilize.”

And while a lot of progress has already been made, experts agree that deepfake attacks will continue to evolve. In response, AI-based voice biometrics solutions will need to continue to improve to stay ahead of those looking to use AI to thwart them. 

Phillip Britt is a freelance writer based in the Chicago area. He can be reached at spenterprises1@comcast.net.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Safety and Ethical Concerns Loom Large in Voice Cloning

AI makes synthetic speech sound more realistic than ever—and therein lies the danger.