-->

Deepdub Launches Phantom X 3.2 Dubbing Model

Article Featured Image

Deepdub, a voice artificial intelligence company, today launched Phantom X 3.2, an artificial intelligence speech model for dubbing and real-time voice agents.

With enhanced voice quality, multilingual capabilities, and ultra-low latency, Phantom X 3.2 is built for scalable, high-quality AI voice and dubbing solutions.

Deepdub GO, the company's localization platform, is now powered by Phantom X 3.2. GO helps production teams generate, review, and deploy AI dubbing across dozens of languages within high-volume localization pipelines.

Phantom X 3.2 produces professional-quality voice output with human-like delivery across extreme pitch, speed, and prosody ranges and supports zero-shot voice cloning from as little as one second of reference audio, even from noisy or degraded source material. Expanded emotion styles, including Joy, Giggle, and Laughter, can be layered within a single line, and a new Key Names and Phrases (KNP) system ensures consistent pronunciation and translation of recurring character names and technical terms across full episodes and series.

The model's precision phonetics for stress-timed languages ensures perfect pronunciation in languages where stress impacts meaning, such as Russian, Hebrew, and other languages in which incorrectly applied stress alters the meaning of the word.

Phantom X 3.2 enables streaming platforms and studios to localize series into 10-20 languages simultaneously while maintaining consistent character voices, accurate pronunciation of names and terms, and natural performance across episodes. The model also supports animation and franchise localization, large catalogue dubbing of films and television libraries, fast-turnaround localization for trailers, promos, and global releases, and natural narration for documentaries and unscripted programming.

For real-time voice agents, Phantom X 3.2 delivers approximately 125-millisecond end-to-end latency, making it suited for customer support, virtual assistants, and interactive AI pipelines. Speech generation begins as text arrives, processing the remainder of each sentence in parallel to enable natural, uninterrupted real-time conversations. The model also maintains consistent voice identity, emotional control, and audio quality across extended interactions, with automatic speaker gender detection that persists throughout a session.

"The demands on voice AI have never been more complex or more consequential," said Ofir Krakowski, CEO and co-founder of Deepdub, in a statement. "Content owners and global enterprises need every language to feel native and every conversation to feel human. But beyond quality, the economics of localization are being rewritten. Streaming platforms can now make on-demand localization decisions as content breaks through in a new market without pre-committing budgets to languages that may never be needed. With Phantom X 3.2<, we've built a model that meets every bar simultaneously: Hollywood-grade expressiveness, real-time responsiveness, and the unit economics that make agile, language-by-language expansion a real business decision rather than a gamble. And this is just the beginning. We're continuing to push the boundaries of what's possible in dubbing and localization, with agentic AI workflows that will further automate and orchestrate pipelines end-to-end, making world-class localization faster, smarter, and more accessible than ever before."