NVIDIA last week at the Interspeech 2021 conference introduced RAD-TTS, a technology that lets individuals use their own voices to train artificial intelligence systems for pacing, tone, pitch, and other qualities. With RAD-TTS, users can also deliver one speaker's words using another person's voice.

RAD-TTS also lets users go frame-by-frame to fine tune the synthesized voice to emphasize or de-emphasize specific words, modify the pace of the narration, alter the pitch, and more.

"With this interface, our video producer could record himself reading the video script and then use the AI model to convert his speech into the female narrator's voice. Using this baseline narration, the producer could then direct the AI like a voice actor — tweaking the synthesized speech to emphasize specific words, and modifying the pacing of the narration to better express the video’s tone," an NVIDIA executive wrote in a recent blog post.

NVIDIA is distributing the RAD-TTS product via open source through the NVIDIA NeMo Python toolkit.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues