3Play Media Finds AI Has Led to Significant Improvements in Speech Recognition

Advances in artificial intelligence have significantly improved the accuracy of automatic speech recognition technology, according to a report from 3Play Media, which recently tested speech recognition with 10 relevant ASR engines.

"The advances in AI we've seen across industries have also had an impact on ASR," Chris Antunes, co-CEO and co-founder, 3Play Media, said in a statement. "Long-time industry leader Speechmatics and newer entrants AssemblyAI and Whisper performed at the top of the pack, with each excelling in different areas. This proves that not all engines are created equal; the training material and models matter; and that there is room at the top for multiple engines to specialize in different use cases."

Accuracy is the key component in captioning to ensure that individuals who are deaf or hard of hearing receive information that fully depicts the original content. The industry requirement for accessibility is 99 percent accuracy, but even the best engines performed well below that, indicating a continued need for human revision, the study found.

The study measured accuracy against word error rate (WER) and formatted error rate (FER). While WER is used as the standard measure of transcription accuracy, FER takes into account formatting, sound effects, grammar, and punctuation. Even the best-tested engines were only 82 percent accurate in FER, whereas the best-tested engines in WER were 93 percent accurate.

Additionally, the study identified a new problem: hallucinations, the tendency to generate text that has no basis in the audio.

As ASR improves, it's important to understand which engine is best for different use cases. Some nuances to consider include performance on different error types, transcription styles, formatting, and industry-specific content, 3Play Media executives concluded.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

3Play Media Finds AI Has Led to Significant Improvements in Speech Recognition

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API