Synthesia Relases Express-2 Models
Synthesia, providers of an artificial intelligence video platform, has released its Express-2 video and audio models and a family of avatars based on those models.
These new models combine accent-preserving voice cloning with expressive facial expressions, lip sync, and natural hand and body gestures to reproduce the performances of professional speakers.
Express-2 has been optimized for realism, control, and reliability at scale, generating 1080p, 30 frames-per-second videos of arbitrary length.
The AI engine is built from these two tightly-connected parts:
- Express-Voice, a system that instantly clones a voice from a few seconds of audio, preserving identity, accent, and expressiveness. In blind tests with 100 native English-speaking evaluators across 17 accents, Express-Voice was rated highest for matching the original speaker’s identity, rhythm, and accent.
- Express-Video, an avatar animation and rendering system that translates speech into natural gestures, facial dynamics, and photorealistic frames through three coordinated models based on diffusion transformers.
"Body language makes up for more than half of our communication and, with the release of our proprietary Express-2 models, our AI avatars have never been more engaging or felt more real," said Alexandru Voica, head of corporate affairs and policy at Synthesia, in a statement.