-->

Modulate Expands Velma with Voice-Native Real-Time Conversation Intelligence

Modulate, a conversational voice intelligence company, today released its flagship Velma model through its developer API.

Previously locked to enterprises, now any developer can access and deploy the voice-native conversation intelligence model, which natively understands audio and provides live insights into emotion, intent, behavioral risk, and conversational context.

Velma Enterprise API helps organizations move from post-call analysis to continuous, real-time conversational understanding and intervention and beyond transcription and point solutions toward a broader enterprise intelligence layer for live voice conversations. Powered by Modulate's Ensemble Listening Model (ELM) architecture, the API provides a real-time listening layer that identifies and interprets the signals that determine what is actually happening in a conversation beyond just the words spoken

"As enterprises deploy more AI across customer interactions, they're realizing that transcription alone is an incomplete foundation for understanding conversations," said Mike Pappas, CEO and co-founder of Modulate, in a statement. "The excitement we're seeing from operators, compliance teams, and customer experience leaders comes from finally having infrastructure that can interpret conversational and emotional context in real time, beyond the transcript."

Velma uses an ensemble of specialized models that work together to analyze conversational audio across multiple dimensions. Velma can detect emotional signals, conversational dynamics, behavioral patterns, and non-verbal cues that traditional speech-to-text pipelines often miss.

Velma Enterprise API can support the following use cases:

  • Fraud and risk detection, identifying signs of synthetic audio, urgency, manipulation, policy avoidance, or other risk signals during live interactions.
  • Customer experience and contact center intelligence, helping teams understand caller emotion, frustration, confusion, escalation risk, and service needs in real time.
  • AI agent oversight, detecting when AI agents might be making inaccurate claims, violating policies, or failing to respond appropriately to customer needs.
  • Trust and safety, recognizing harmful, abusive, or policy-violating behavior in live voice environments.
  • Operational intelligence, turning conversational audio into structured, explainable signals that can inform review, escalation, training, and decision-making workflows.
  • Compliance and vulnerable customer protection, helping organizations identify signs of distress, confusion, disclosure failures, or regulatory risk during live interactions.

"Fraud, customer dissatisfaction, policy violations, and AI failures don't politely happen only in the first 30 seconds of a call. Enterprises need systems that can listen continuously, explain what they are hearing, and help humans act quickly," Pappas added.