The 2015 Speech Industry Star Performers: Baidu

Baidu Improves Error Rates with Deep Speech

If you haven't heard of Baidu by now, chances are you will soon. The 15-year-old company-turned-conglomerate has been called the Google of China for its super-navigation engine—and, similar to Google in the U.S., it has grown at a tremendous rate. Baidu has a strong presence in a wide range of endeavors, from partnering with Uber to developing supercomputers. The company has become well known for multiple efforts in speech technology, including recognition accuracy, mobile, in-car systems, and even robots.

Baidu woke up the speech tech world in December 2014 when it announced that its California-based research arm had developed a technology dubbed "Deep Speech," which employs neural networks. Deep Speech's recognition accuracy bested the company’s peers—including Apple Dictation, Bing Speech Services, Google Web Speech, and Wit.ai—in noisy environments such as restaurants, in addition to far-field and reverberant scenarios.

Baidu's tests at the time showed that in noisy environments, Deep Speech's word error rate was 10 percent better than its competitors. According to the company, Deep Speech was tested on 200 utterances, half of them in quiet places and the other half in loud ones. While Deep Speech showed a small improvement in so-called "clean" speech samples, the technology beat out the offerings from Apple, Bing, Google, and Wit.ai in noisy conditions "by a large margin," the company said.

"We still see a large gap between even the best speech recognition systems and human level performance," the company stated. "But by vastly scaling up a neural network architecture able to transcribe audio in an end-to-end fashion, Deep Speech has taken a step towards closing this gap."

Andrew Ng, chief scientist at Baidu's artificial intelligence research lab, said the results were achieved by using deep learning, which analyzes massive sets of data. According to Baidu, "a typical training set for the largest speech recognition systems is between 5,000 and 10,000 hours. Applying data augmentation methods, such as noise addition, to existing data sets created more than 100,000 hours of synthetic data for Deep Speech to learn from."

Ng said in a statement: "I'm excited by this progress, because I believe speech will transform mobile devices, as well as the Internet of Things. This is just the beginning."

Indeed, Baidu has a host of areas where Deep Speech can be, or is already, deployed. In April, the company debuted DuWear, an operating system used in its new smart watches. The watch is activated with the wake-up phrase, "Hi Baidu," and allows wearers to carry out the same kind of tasks, like voice searches, that Android watches can perform.

And Baidu is once again pitting itself against Google: In 2014 the company teamed up with BMW and is reportedly launching a self-driving car before the end of the year.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The 2015 Speech Industry Star Performers: Baidu

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Soul App Launches Full-Duplex Voice Model

Mistral Unveils Voxtral Open-Source AI Voice Model

Leena AI Launches Agentic AI Colleagues