Amazon Says Semi-Supervised Learning Builds Better Speech Recognition

Last week, Amazon announced that it has used a large unbundled data set to expose Alexa to a variety of human sounds, in an effort to improve speech recognition. The data set is thought to be one of the largest in history used to train an acoustic model. Amazon said it a blog post, "We developed an acoustic model, a key component of a speech recognition system, using just 7,000 hours of annotated data and 1 million hours of unannotated data.”

The Amazon scientists say they’re using Semi-Supervised—a combination of sounds that have been tagged by human beings as well as machines to train Artificial Intelligence engines. According to Amazon, the results were a reduction in speech recognition errors by 10-22%. Scientists say that this method works better than using sounds tagged only by machines.

According to the blog post, “Compared to a model trained only on the annotated data, our semi-supervised model reduces the speech recognition error rate by 10% to 22%, with greater improvements coming on noisier data. We are currently working to integrate the new model into Alexa, with a projected release date of later this year.”

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

Providence St. Joseph Health Makes Same-Day Express Care Appointment Scheduling Available On Amazon Alexa

With the voice request, "Alexa, open Providence Health Connect," or "Alexa, open Swedish Health Connect," consumers can schedule Express Care appointments.

Yappa Debuts First Audio/Video Social Commenting Tool to Encourage Less Toxic Online Interactions

Free new plugin makes it fun and easy for site visitors to use voice or video to record quick comments.