-->

Microsoft Launches MAI Models for Speech and Voice

Article Featured Image

Microsoft has launched MAI-Transcribe-1 and MAI-Voice-1, two artificial intelligence speech models for speech recognition and voice generation.

MAI-Transcribe-1 is designed for speech-to-text transcription across 25 languages (English, French, German, Italian, Spanish, Hindi, Portuguese, Czech, Danish, Finnish, Hungarian, Dutch, Polish, Romanian, Swedish, Japanese, Korean, Chinese, Arabic, Indonesian, Russian, Thai, Turkish, and Vietnamese). The model is optimized for real-world environments and delivers batch transcription speeds 2.5 times faster than Microsoft's previous Azure Fast offering. It is already powering voice mode in Microsoft Copilot and can be used for live captioning, call center transcriptions, video subtitling, accessibility, e-learning, media archiving, and market research. The model can run either in the cloud or on premises.

MAI-Transcribe-1 can produce high-quality batch transcripts based on MP3, WAV, and FLAC files. Microsoft says the model will soon support diarization, which identifies and separates speakers in recordings; contextual biasing, which helps the model prioritize domain-specific terminology and proper nouns; and real-time streaming.

MAI-Voice-1 offers hyper-realistic, natural, and expressive speech generation forcreating custom brand voices from just one minute of audio, preserving speaker identity across long-form content. The model can generate up to 60 seconds of audio per second and is built for high-efficiency enterprise use cases. It is already being integrated into Copilot experiences, including audio-based features and podcasts.

"At Microsoft AI, we're building humanist AI. We have a distinct view when creating our AI models, putting humans at the center, optimizing for how people actually communicate, training for practical use,” Mustafa Suleyman, head of Microsoft's AI division, wrote in a blog post.

Both voice models will be available on Microsoft Foundry along with Microsoft's MAI-Image-2, the latest version of its image-generation technology.