-->

AiOla Launches Drax Open-Source Speech Model

Article Featured Image

aiOla, a voice-artificial intelligence lab advancing speech recognition technology, today introduced Drax, an open-source AI model that brings flow-matching-based generative methods to speech.

Drax captures the nuance of real-world audio and delivers accuracy on par with or better than leading models like OpenAI's Whisper while achieving five times lower latency than other major speech systems, such as Alibaba's Qwen2, reaching up to 32 times faster than real time performance.

Drax introduces a new way to train speech recognition systems that achieves both speed and accuracy. While most ASR models process speech sequentially, predicting one token at a time, Drax outputs the entire token sequence in parallel, capturing full context at once. This parallel, flow-based approach dramatically reduces latency and prevents the compounding errors that often occur in long transcriptions, according to aiOla.

"Gone are the days of choosing between transcription accuracy or speed," said Yossi Keshet, chief scientist at aiOla, in a statement. With Drax, we've achieved a real breakthrough in speech recognition technology that delivers both technical innovation and immediate real-world impact. And, by open-sourcing it, we hope to spark further discovery and collaboration from the community."

Like image diffusion models that refine a picture from random noise, Drax learns to reconstruct speech from an initial noisy representation. During training, it moves through a three-step probability path: starting with meaningless noise, transitioning through a speech-like but imperfect middle state, and finally converging on the correct transcript. This middle stage exposes Drax to realistic, acoustically plausible errors, helping it stay robust to accents, background noise, and natural speech variability, the company said.

"There's no room for error in critical applications of speech technology," said Gil Hetz, vice president of AI at aiOla, in a statement. "That's why Drax is such a breakthrough; it combines accuracy and speed without compromise. It handles real-world speech, no matter the background noise, accent, or jargon, making it not only technologically impressive but also [delivering] the kind of reliability modern enterprises need."

"Voice is the most natural and efficient way for data entry, and it will become the default way for human-machine interaction," said Amir Haramty, president of aiOla, in a statement. "Today, transcription often fails to keep up with the pace of real-world operations, from customer service to compliance. With Drax, we're closing that gap and making voice technology actually practical at scale. That's why advancing speech recognition is so important: It's the future of the enterprise, and I'm incredibly proud to be part of aiOla, where we're continuously pushing the boundaries of what's possible."

In testing Drax on English benchmarks, the technology achieved an average word error rate of 7.4 percent, according to aiOla, and across Spanish, French, German, and Mandarin Chinese, Drax maintained comparable or better accuracy, demonstrating consistent performance across languages and environments.

aiOla is releasing Drax under a permissive open-source license, making the model publicly available on GitHub and Hugging Face. The release includes three model sizes, from the lightweight Flash version to the full-scale base model.