Krisp Launches VIVA 2.0, an Infrastructure for Voice AI Agents
Krisp today launched Krisp VIVA 2.0, a voice artificial intelligence infrastructure layer for voice agents, interactive voice response systems, and conversational AI.
Krisp VIVA 2.0 introduces a new generation of small, real-time models that improve word error rate, predict when users finish speaking, classify interruptions, and read perceptual signals like synthetic speech, gender, and accent.
Krisp's VIVA SDK runs server-side directly in each customer's audio pipeline before speech-to-text.
New capabilities in VIVA 2.0 include the following:
- Turn Prediction v3: A multilingual model that predicts end-of-turn from audio alone, reacts quickly to real turn-ends while holding through mid-sentence pauses, and runs on standard CPUs or locally on-device for robotics and conversational toys.
- Interrupt Prediction v1: An audio-only classifier that predicts when a user intends to interrupt the agent (start-of-turn prediction), distinguishes intent-to-take-the-floor from backchannel speech like "yes" or "mhm."
- Voice Isolation v3, upgraded to deliver measurable improvements in downstream word error rate.
- Signal Detectors: A new category of real-time audio models that give voice AI the perceptual cues humans use without thinking. Three models launching with VIVA 2.0 include TTS Detector, which identifies synthetic speech in real time and recognizes when an inbound voice AI agent or IVR picks up; Accent Detector, which identifies the speaker's accent so audio can be routed to the STT model best tuned for it; and Gender Detector, which identifies speaker gender to enable personalized responses. All models run on standard server CPUs and operate on audio input alone with no transcription required.
Krisp VIVA SDK processes more than 12 billion minutes of voice AI agent traffic a year and is embedded in more than 130 voice AI products, including Daily, Vapi, LiveKit, Ultravox, Telnyx, AI labs, and contact centers.
"Voice is becoming the primary interface between humans and AI,"said Robert Schoenfield, executive vice president of licensing and partnerships at Krisp, in a statement. "Those conversations don't happen in clean environments. They happen in the real world, shaped by noise and subtle human cues. VIVA brings that layer into the system, so voice agents can operate the way people actually speak."