July 11, 2023
By Arto Yeritsyan Founder and CEO of Podcastle
Industry Voices

How Voice Cloning is Transforming Podcasting

Podcasting has come a long way since the days of right-clicking and saving an mp3 file to your desktop. Big tech companies have made podcasting a major part of their content play, and video podcasting has been a huge step forward, with YouTube and Spotify getting in on the act.

The other big leap in podcasting is the arrival of text-to-speech technology that is allowing creators to produce audio content without saying a word. Although this technology has been deployed for many years in customer service, as virtual assistants and speech recognition, it's relatively new in the world of content creation. Since January, Google searches for AI voices have exploded by a factor of ten, demonstrating the increased interest among the general public and undoubtedly driven by broader coverage of generative AI tools.

What's also new is the ability to create a clone of your own voice. In the past, options to generate speech might have been limited to a set number of pre-programmed voices, but now creators can easily duplicate their own voices for a low cost. For podcasters and other creators, there are several benefits to using this technology. The first is efficiency. Rather than having to go back and re-record errors or promotional snippets, they can instead use an AI-powered voice to do it more quickly. They could even skip the whole recording process and create voice-overs or introductions entirely using text-to-speech.

It also means they're less dependent on having access to recording equipment and a suitable location. Imagine you're a self-employed content creator who works remotely and needs to quickly put together some recordings but don't have your microphone, pop shield, or sound-proof room. These challenges can be overcome by simply logging into the software and accessing the synthetic version of your voice to start creating.

It's clear that this technology has huge potential, but as we've seen with other generative AI technologies it can also be abused. So, how can we ensure that the technology is being used ethically?

Prioritizing Safety in Voice Synthesis

From a user's perspective creating a clone of one's own voice can seem almost magical. Provide audio samples, click a button, and within minutes or hours you have a voice that sounds uncannily like your own. While the back-end process of creating AI voices is more complex, it doesn't necessarily require huge amounts of data to be successful. The major challenge is to be able to make it realistic, to capture users' natural intonation, and make it accurate as well as spontaneous.

The potential problems that come with voice cloning relate to information security and illegal activities like phishing (the act of soliciting information via impersonation). In theory, you could clone the voice of a famous person or of someone you know and use that to gain access to financial accounts or to deceive someone.

This is why it's essential that companies developing this technology consider the risks, just as they would with any other powerful technology. For example, with one platform, creators who wish to clone their voice are required to perform a live reading of a script of around 70 specific phrases. This process can take up to 30 minutes and must be done by the person who wishes to have his voice duplicated.

These 70 recordings are then manually checked to ensure accuracy of a single voice, and then the recordings are processed through an AI model. It's therefore not possible to simply upload audio from a TV show or interview or a short clip of someone speaking and create a copy of the voice.

AI-powered podcasting represents an incredibly productive step forward for podcast creators, allowing them to focus on what they do best and reduce the pain involved in the creative process. Just like audio recording and editing software meant no longer having to rely on physical media, AI-powered software means no longer having to rely on manual processes that slow things down.

The potential of voice duplication and synthetic voices is huge for all media, podcasts included, and ensuring that it's used responsibly should be a key aim for all purveyors of the technology.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

How Voice Cloning is Transforming Podcasting

Prioritizing Safety in Voice Synthesis

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API