-->

FlashLabs Releases Chroma 1.0 Voice AI Model

Article Featured Image

FlashLabs, an applied artificial intelligence research and engineering lab building real-time agentic systems, has release Chroma 1.0, an open-source, end-to-end, real-time speech-to-speech AI model with personalized voice cloning.

By operating natively in voice without the traditional ASR → LLM → TTS pipeline, Chroma enables natural, fluid conversations that feel immediate, responsive, and human. Chroma is natively speech-to-speech, enabling the following:

  • End-to-end TTFT under 150 milliseconds.
  • Natural conversational turn-taking.
  • Low-latency emotional and prosodic control.
  • Stable real-time inference without cascading delays.

With Day-0 SGLang support, Chroma further reduces latency and improves throughput. It also introduces few-second reference voice cloning, allowing users to generate personalized voices from minimal audio input. In internal evaluations, it achieved the following results:

  • Speaker similarity score (SIM): 0.817.
  • +10.96 percent above human baseline (0.73).
  • Best-in-class performance among both open and closed baselines.

Despite using compact 4B-parameter architectures, Chroma delivers strong reasoning and dialogue capabilities by leveraging modern multimodal backbones and optimized real-time inference. This makes it suitable for edge deployment for real-time voice applications, including the following:

  • Autonomous voice agents.
  • AI call centers.
  • Real-time translators.
  • Conversational assistants.
  • Interactive characters and non-player characers.
  • Multimodal AI systems.

"Voice is the most universal interface in the world, yet it has remained closed, fragmented, and delayed," said Yi Shi, founder and chief research and engineering officer of FlashLabs, in a statement. "With Chroma, we're open-sourcing real-time voice intelligence so builders, researchers, and companies can create AI systems that truly work at human speed."