Speech Technology Magazine


Amazon Says Semi-Supervised Learning Builds Better Speech Recognition

Amazon reveals how it developed an acoustic model for Alexa, and used semi-supervised learning to improve speech recognition.
Posted Apr 9, 2019
Page1 of 1
Bookmark and Share

Last week, Amazon announced that it has used a large unbundled data set to expose Alexa to a variety of human sounds, in an effort to improve speech recognition. The data set is thought to be one of the largest in history used to train an acoustic model. Amazon said it a blog post, "We developed an acoustic model, a key component of a speech recognition system, using just 7,000 hours of annotated data and 1 million hours of unannotated data.”

The Amazon scientists say they’re using Semi-Supervised—a combination of sounds that have been tagged by human beings as well as machines to train Artificial Intelligence engines. According to Amazon, the results were a reduction in speech recognition errors by 10-22%. Scientists say that this method works better than using sounds tagged only by machines.

According to the blog post, “Compared to a model trained only on the annotated data, our semi-supervised model reduces the speech recognition error rate by 10% to 22%, with greater improvements coming on noisier data. We are currently working to integrate the new model into Alexa, with a projected release date of later this year.”

Page1 of 1