-->

Hanabi AI Launches OpenAudio S1

Article Featured Image

Hanabi AI, a voice technology startup, has released OpenAudio S1, an artificial intelligence-powered voice actor and generative voice model that delivers real-time emotional and tonal control.

OpenAudio S1 creates nuanced, emotionally authentic vocal output with emotional depth, intentional pacing, and expressive nuance. It delivers more nuanced emotional expression and tonal variation, handling subtleties like sarcasm, joy, sadness, and fear with cinematic depth. Whether it's the trembling hesitation of suppressed anxiety before delivering difficult news or the fragile excitement of an unexpected reunion, OpenAudio S1 allows users to control and fine tune vocal intensity, emotional resonance, and prosody in real time.

With S1, users can adjust tone, pitch, emotion, and pace in real time, using not only simple prompts such as (angry) or (voice quivering), but also a diverse range of more nuanced or creative instructions such as (confident but hiding fear) or (whispering with urgency). This allows for incredibly flexible and expressive voice generation tailored to a wide range of contexts and characters.

S1 performs across 11 languages, handling multi-speaker environments (such as dialogues with multiple characters) in multilingual contexts and supporting seamless transitions between languages without losing tonal consistency.

"We believe the future of AI voice-driven storytelling isn't just about generating speech; it's about performance," said Shijia Liao, founder and CEO of Hanabi AI, in a statement. "With OpenAudio S1, we're shaping what we see as the next creative frontier: AI voice acting."

OpenAudio S1 is powered by an end-to-end architecture with 4 billion parameters, trained extensively on diverse text and audio datasets, and fully integrated into the fish.audio platform. In independent third-party testing, it was found to deliver sub-100ms latency, making it ideal for real-time applications like gaming, voice assistants, and live content creation where immediate response is crucial.

"Voice is one of the most powerful ways to convey emotion, yet it's the most nuanced, the hardest to replicate, and the key to making machines feel truly human," Liao said. "But it's been stuck in a text-to-speech mindset for too long. Ultimately, the difference between machine-generated speech and human speech comes down to emotional authenticity. It's not just what you say but how you say it. OpenAudio S1 is the first AI speech model that gives creators the power to direct voice acting as if they were working with a real human actor."

SpeechTek Covers
Free
for qualified subscribers
Subscribe Now Current Issue Past Issues