Deepfakes: The Latest Trick of the Tongue
“WORDS: So innocent and powerless as they are, as standing in a dictionary, how potent for good and evil they become in the hands of one who knows how to combine them.” —Nathaniel Hawthorne
This master of American literature uttered this quote back in the 19th century, but he sounds prescient today, because Hawthorne could almost be warning about something he couldn’t have possibly foreseen: the dangers of speech deepfakes in the 21st century.
Deepfakes employ artificial intelligence and machine learning to create convincing synthetic reproductions of human speakers.
How convincing? Consider that, in a well-reported 2019 AI-powered crime, thieves used deepfake technology to mimic the voice of a company’s CEO and used it to direct a subordinate to transfer $250,000 out of the corporate account to their own bank account. The funds were transferred and redistributed, and no suspects were ever identified. Another widely reported deepfake-enabled financial heist resulted in the cloning of a bank director’s voice and the illegal transfer of $35 million.
But before you scoff at the idea of being fooled by a phony voice, take a moment to learn the facts as well as the risks and liabilities that threaten large and small organizations alike. And learn ways to help you avoid being victimized by this increasingly concerning digital danger.
Speech and audio deepfakes are synthetic media content created or altered with the help of deep learning to persuade people that they are real, according to Jelle Wieringa, security awareness advocate at KnowBe4, a security awareness training provider
“While the technology can be used with good intent, for instance by re-creating a person’s voice who has lost it, it can also be used by bad actors for malicious purposes. The state of current deepfake technology is such that very convincing audio can be created that is indistinguishable from the real thing by an untrained ear. This has led to deepfakes becoming a tool for bad actors to mislead people,” Wieringa says.
Ask Robert Wakefield-Carl, senior director of innovation architects at Avtex Solutions, an IT service management company, and he’ll tell you that speech deepfakes represent a disturbing new development in technology that can lead to misunderstanding, slander, and misrepresentation, “much like the early days of photoshopping the head of one individual onto the body of another—usually to put that person in a poor situation.”
Eangelica Aton, product owner at Gryphon.ai, a provider of AI-driven conversational intelligence, says the extent to which speech deepfakes focus on voice mimicry can be alarming.
“By using audio files and neural voice puppetry, tones and pitches can be measured to create a map of someone’s voice, creating a synthesized voice to use in a text-to-voice system,” she explains. “In a speech deepfake, you can make the newly cloned voice say anything you type. Combining both speech and video deepfakes can lead to an almost indistinguishable persona of any living and non-living person.”
Industry research firm Gartner predicts that within two years, 20 percent of all successful account takeover attacks will use deepfakes and synthetic voice augmentation.
The Deepfake Threat
“The first thing to understand is that there is no foolproof way to prevent deepfakes,” says Andrew Selepak, a social media professor at the University of Florida. “They can be spread on the internet, via social media, in emails, messaging apps like WhatsApp or Telegram, or even SMS messages, so no amount of moderation or government regulation will prevent them.”
He warns that deepfake tech could most definitely be used for criminal purposes or to ruin a person’s reputation.
“Just by using speech deepfake technology, someone with only minimal audio editing and computer skills could alter speeches from the past. President Roosevelt’s fireside chats could be altered to make him say he supports the Nazi regime, or Martin Luther King’s ‘I Have a Dream’ speech could be altered to preach violence,” Selepak says. “Deepfake technology could also be used to start riots, wars, and encourage violence. The lack of media literacy in this country and around the world is a perfect stage to allow deepfake technology to flourish.”
Speech deepfakes present an increased risk for companies and organizations, especially as it relates to the safeguarding of confidential and valuable data.
“Speech deepfakes can be used to take over user accounts, spoof internal processes, and request money transfers. These can be heavily damaging to a company’s profitability and reputation and the well-being of its customers,” notes Sanjay Gupta, vice president and global head of products and corporate development at Mitek Systems, a software company that specializes in digital identity verification and mobile capture. “Any organization that has voice commands heavily built into their user interface and security protocols are at increased risk for a speech deepfake incident.”
Say, for instance, a speech deepfake of a CEO shares disinformation about a company.
“It can significantly impact the company’s reputation, consumers’ opinions, and even the stock price,” warns Matt Muldoon, president of North America at text-to-speech systems provider ReadSpeaker.
The technology poses a great risk because in general voices offer more intimate access to people’s trust, making them a potent tool for fraud.
“If you hear the voice of someone you trust on the phone, you are probably unlikely to suspect a scam, for now. Once that trust changes, the real danger lies in the fact that people could become even more suspicious of speech deepfakes, resulting in more people seeking a kind of semi-technological remedy for trust,” Aton says. “This kind of deepfake technology comes with the real possibility of making actual communication much more cumbersome and, at times, impossible. By increasing a kind of neurosis in people who will eventually become ever more vigilant and distrustful, it will become easier for other information channels to dominate, such as advertisement-laden social media.”
Many assume that video deepfakes are more dangerous and damaging than sham audio. But Kathy Sobus, senior director of customer experience strategy at ConvergeOne, an IT service management firm, begs to differ.
“Audio deepfakes are more concerning because victims can be pressured into doing things that they wouldn’t necessarily do with video,” Sobus says. “A great example of this would involve transferring money. Let’s say you handle the financials for your company, and someone calls you and asks you to act immediately. What would you do? If that person sounds like your boss or someone else who would have the authority to ask you to make that request, odds are you might react without questioning it because of how legitimate the caller seemed. However, if that request was made today via a video deepfake, you might have been able to detect it as fraud.”
Who Is Most Vulnerable?
The organizations most susceptible to speech deepfakes are those, unsurprisingly, that are the least prepared, Aton warns.
“Most corporations and governments will find a way to combat deepfakes by updating the sophistication of the monitoring abilities of their tech stack. However, not all companies are recognizing the threat or are equipped with updated security and compliance measures, creating vulnerabilities for deepfakes to infiltrate,” Aton continues. “That’s why it’s crucial to educate organizations about the threat that deepfakes pose and the importance of keeping their tech up to date.”
Because deepfakes are a highly specialized tool that can be leveraged to extort money and data, awareness about them has not widely spread yet among employees. This opens attack points upon which bad actors can pounce, Wieringa cautions.
“Any organization that has no mitigative measures in place against deepfakes is at risk. This includes enterprises that rely on audio confirmation for important processes, like banks and insurance companies,” he says. “However, cybercriminals do not discriminate. They go after everyone who has something valuable, whether this is money, data, or access to valuable information of high-profile people.”
Politicians and public figures are particularly exposed to the hazards of speech deepfakes.
“They have a lot of source material out there for an unscrupulous person to take advantage of. Politicians also have the most to lose by having words put in their mouth and suddenly finding themselves on the record for something they never said,” notes David Ciccarelli, CEO of Voices, a provider of voice talent.
The sticky widget is that audio deepfakes occur in real time. Therefore, monitoring and detection technology needs to be built into networks to better ferret out the frauds.
“Technologies are being developed both by academic research labs and startups that are capable of predicting if an audio sample is genuine. But with the advances in deepfake generation, it is an uphill battle,” Wieringa continues.
Gupta points out that there are new voice verification technologies that use a sample of a voice to prove it’s coming from a live person and that it’s really the user’s voice, and they can do so with high accuracy.
“By combining facial liveness technology with voice recognition, we can significantly decrease risk and proactively reaffirm user identity,” Gupta adds.
Several initiatives, such as C2PA and the CogSec Collaborative, work to safeguard against the malicious use of deepfakes. New technology has been developed that helps differentiate between real and synthetic voices. And researchers are continually working to create technologies that can detect audio deepfakes. Case in point: Joel Frank and Lea Schonherr, from the Horst Gortz Institute for IT Security at Ruhr University in Bochum, Germany, gathered 118,000 samples of synthesized audio voice recordings that amount to almost 196 hours of fake voice recordings in English and Japanese. When comparing the deepfake recordings with the real speech, they noticed subtle differences in high frequencies between the real and fake files.
“Biometric authentication technology can also be used by an organization relying on voice authentication as a way to identify and authenticate users. As with facial biometrics, using this technology would mean that organizations would need to store characteristics of a user’s voice. But while being a valid defense against voice-based attacks, it opens a whole host of new attack vectors and possible issues,” Wieringa says. “And the evolution of deepfakes might render this technology useless as the quality of audio deepfakes continues to grow.”
Additionally, because this technology is not yet widespread, the onus is on organizations to better understand the risks and pitfalls of speech deepfakes, Muldoon notes.
Best Practices to Prevent Being Speech-Scammed
The situation is sobering but not hopeless. There are things you and your organization can do to help safeguard against a costly and harmful speech fake-out.
Sobus suggests the following tips:
- Stay in the know about the latest deepfake phishing and social engineering tactics.
- Make cybersecurity a top priority at your institution. Create a team dedicated to this topic to regularly evaluate your protocols and educate those around them about recent developments and technologies as well as fraudulent activity to avoid.
- Work with a reputable IT services provider that understands the evolving cybersecurity landscape and can help your IT team determine the best ways to protect your company.
In addition, similar to educating employees about malware and email phishing attempts, “organizations should add deepfake education to employee training,” Muldoon says. “Companies should teach employees about how the technology can be leveraged maliciously and what employees should do if they suspect they’ve received a deepfake video or audio recording.”
Wieringa shares that sentiment.
“Make your staff aware of the existence and characteristics of audio deepfakes,” he suggests. “Also, research the impact audio deepfakes might have on your organization and look for the weak spots.”
Consider deploying two-factor (or more) authentication, too.
“By using a one-time token or confirmation via an alternate channel, like SMS, a company can ensure it’s the legitimate person making a change or transaction,” Gupta says.
Also using AI-powered biometric authentication methods could be especially effective because they require an individual’s unique voiceprint, thumbprint, or facial features via a camera-enabled device, like a phone or webcam.
“Biometric authentication is a two-step process—first, you identify the individual and then you authenticate that the individual is who they say they are,” Sobus says.
What’s more, companies should ensure that audio samples are ethically used, securely stored, and destroyed when no longer needed, advises Kara McWilliams, associate vice president of the AI Labs at ETS.
“Strict data management will reduce the likelihood that audio data cannot be hacked and used for fraudulent purposes,” McWilliams says. “For prominent public figures or politicians where the content of the voice is critical, companies should apply security measures to that audio. Audio watermarks or a cryptographic hash using blockchain or similar technologies, such as a digital signature, can guarantee and verify the authenticity of the voice sample or content.”
Lastly, when it comes to online communication, organizations can insist a user reveal information that a deepfake party is unlikely to possess in recorded material, such as presenting a sign/counter-sign code to verify a person’s authenticity, Aton recommends.
Erik J. Martin is a Chicago area-based freelance writer and public relations expert whose articles have been featured in AARP The Magazine, Reader’s Digest, The Costco Connection, and other publications. He often writes on topics related to real estate, business, technology, healthcare, insurance, and entertainment. He also publishes several blogs, including martinspiration.com and cineversegroup.com.