2023 Speech Industry Award Winner: Resemble AI Fights for Responsible Use of Voice Clones
A study in England found that people could correctly identify speech deepfakes—the synthetic voices produced by artificial intelligence to mimic the voices of real people—only 73 percent of the time, raising serious concerns about security threats. Deepfakes have already been used to trick bankers into authorizing fraudulent money transfers and to trick grandparents into believing that one of their grandchildren was arrested and needs money for bail.
As speech synthesis technologies become more advanced, experts predict that such deepfakes will only become more difficult to detect. Training people to detect speech deepfakes is unrealistic, they maintain, and instead urge the speech industry to focus on building and refining automated detectors.
That has been a central focus of Resemble AI, providers of a platform that uses generative AI to create realistic-sounding voices. In July the Toronto-based company released Resemble Detect, a deepfake audio detector.
Resemble Detect validates the authenticity of audio data to expose speech deepfakes in real time with up to 98 percent accuracy, according to the company. It also analyzes audio across all forms of media and against all modern generative AI speech synthesis solutions.
Resemble Detect uses a state-of-the-art deep neural network and an AI model trained to identify fakes from real audio based on artifacts—the sonic material that results from the editing or manipulation of sound—and its own spectrogram. The software also gives confidence scores from 0 to 100.
Resemble Detect was released as a complement to the PerTh Watermarker, which Resemble AI released earlier this year. The Watermarker technology uses AI to produce and insert imperceptible-to-the-human-ear audio tones that carry identifying information.
“Our mission is to make interactions with digital products as human and natural as possible,” says Zohaib Ahmed, CEO of Resemble AI, who cofounded the company in 2019. “We want to make AI tools available to more people, but we also acknowledge that we can’t talk about AI without talking about ethics. We’re proud to release Resemble Detect, another solution in our toolkit to ensure legitimacy.”
To that end, Resemble AI also requires explicit user consent to clone voices and has strict internal guidelines to prevent malicious use of the voices it creates. The company, for example, requires users to record a consent clip in the voice they’re attempting to clone. If the voice in the clip doesn’t match the other clips, Resemble blocks the user from creating the AI voice. In addition, users have to say several specific sentences in their own voices, and if they deviate from the script, Resemble flags the recording as potential misuse.
While ethical use of speech and technologies to detect unethical use is important to Resemble, that’s not the only area where the company was active this year. Resemble also this past summer made significant improvements to its Speech-to-Speech model, increasing accuracy and robustness. With real-time voice conversion, natural-sounding AI voices can be created across gaming, entertainment, interactive voice response systems, and more. These AI voices can perform a wide range of emotions, speaking styles, and languages.
That gaming is one of the main uses for the company’s technology is not surprising. After all, Resemble started small, focusing mostly on gaming and expanding into other areas as it built steam. And it has built a lot of steam in just a short time. According to the company, its platform has more than 1 million users who have generated more than 35 years’ worth of audio in just the past few months.
The company has also been able to generate a lot of interest from investors. The company is now hot off a recent $8 million funding round—bringing the total raised to more than $12 million—which it is committing to the development of its enterprise products and increasing the size of its workforce.