2023 Speech Industry Award Winner: D-ID Gives a Human Face and Voice to AI
D-ID, an Israeli company founded in 2017, is providing superpowers to individual creators and businesses alike, uniquely enabling them to transform any picture into an interactive video in seconds.
The company offers a self-service studio and an API that uses generative artificial intelligence to create talking avatars at the click of a button. These avatars are personalized, cost-effective, engaging, and available in 120 languages.
The centerpiece of D-ID’s portfolio is Creative Reality Studio, which uses AI to create videos with photorealistic avatars and narration from a single image. The software animates still photos to facilitate high-quality video productions. The API is powered by neural networks trained on tens of thousands of videos.
The technology enables users to choose the avatar’s identity—including ethnicity, gender, and age—along with its language, accent, and intonation. Users can select high-definition Premium Presenters, with lifelike upper body and facial movements; Custom Premium Presenters, which require short video shoots with the subjects; or Special Presenters, with videos generated from any front-facing photo and text or audio. Videos can be created in multiple languages by simply translating the text rather than engaging different presenters fluent in each language.
D-ID offers extensive neural voices across 119 languages and variants and audio input where users can upload a more personalized voice-driven digital presenter, as well as voice cloning to match a specific voice.
In an update of Creative Reality Studio released within a few months of the initial launch, D-ID enabled users to add facial expressions to their avatars and select video formats, layers, positioning, and transparency.
A partnership with AI voice technology provider ElevenLabs later brought additional voice options to Creative Reality Studio. With the addition of nine ElevenLabs voices to its library, users could convey emotions, emphasis, and personality, together with four visual expressions: serious, happy, surprised, or neutral.
ElevenLabs’ cofounder and CEO, Mati Staniszewski, at the time called D-ID “a leading player in the text-to-video field,” and its technology “top-rated.”
And another Creative Reality Studio update brought an integration with GPT-3 from OpenAI and Stable Diffusion from Stability AI to enable deep learning models to generate digital composite faces and speech based on prompts from users’ descriptions.
“We are only beginning to see the potential of creative generative AI technologies, and they are set to take the world by storm,” said Gil Perry, CEO and cofounder of D-ID, in a statement. “We are incredibly proud to be at the bleeding edge of the emergent generative AI scene, introducing the first platform to offer image and text generation together with animation, providing unlimited creative possibilities and potential.”
Creative Reality Studio isn’t D-ID’s only speech-enabled product. Another very popular product is Speaking Portrait, used by millions of MyHeritage users to give a voice to their family stories. Speaking Portrait uses D-ID’s advanced animation capabilities to create digital personas that speak with natural language and emotion.
And then there’s D-ID’s chat.D-ID, a web app that lets users have real-time video chat conversations with digital avatars made with a combination of D-ID’s text-to-video technology with OpenAI’s ChatGPT.
In essence, chat.D-ID gives ChatGPT a voice and face. When users open the app, they’re greeted by an avatar named Alice, with whom they can communicate by voice or text.
As it did with Creative Reality Studio, D-ID soon updated chat.D-ID so users could choose their own images or avatars for D-ID’s generative AI to animate. Users can select an avatar and voice and provide a brief character description, uploading their own photos or images of historical, fictional, or AI-generated figures and instructing the app to answer as if it were that person, in character.
Chat.D-ID, like all of D-ID-s products, lets users experience what it’s like to chat with a digital person, adding a human touch to AI. And that humanlike interface has been the Holy Grail of the speech industry since its earliest days.