The 2017 Speech Industry Star Performers: Microsoft

Microsoft Moves Speech APIs to Market

Microsoft this spring finally announced the long-awaited general availability of its Custom Speech Service and the Bing Speech API, taking them out of the limited public and beta trials where they had languished for a long time.

Both are part of Microsoft’s Cognitive Services portfolio of artificial intelligence and neural networking, which have already been credited with Microsoft’s great breakthroughs in speech recognition accuracy. Last October, researchers and engineers in Microsoft Artificial Intelligence and Research reported a speech recognition word error rate of just 5.9 percent.

Microsoft Custom Speech Service is for developers of third-party applications. Prior to the release, Microsoft had been working for more than a decade to develop speech recognition technology capable of performing robustly in noisy environments as well as with the jargons, dialects, and accents of specific user groups.

In a blog post, Microsoft engineers explained that the Custom Speech Service lets users customize Microsoft’s speech-to-text engine by uploading their own text and/or speech data. For applications containing particular vocabulary items, such as jargon or product names that rarely occur in typical speech, users can improve performance by customizing language models. Similarly, customizing the acoustic model can enable the system to do a better job of recognizing speech in particular environments or from particular user populations. A voice-enabled app for use in a warehouse or factory, for example, might require a custom acoustic model that can better isolate speech from all of the background noise.

The Bing Speech API converts audio into text, understands intent, and converts text back to speech.

Both the Custom Speech Service and Bing Speech API will enable developers to add the same voice-enabled machine intelligence that powers Microsoft’s Skype Translator, Bing search engine, and Cortana virtual assistant into third-party applications that people use every day.

“Cognitive Services is about taking all of the machine learning and AI smarts that we have in this company and exposing them to developers through easy-to-use APIs so that they don’t have to invent the technology themselves,” said Mike Seltzer, a principal researcher in the Speech and Dialog Research Group at Microsoft’s research lab in Redmond, Wash., in a statement. “In most cases, it takes a ton of time, a ton of data, a ton of expertise, and a ton of computing to build a state-of-the-art machine-learned model.”

The entire collection of Cognitive Services stems from a drive within Microsoft to make its artificial intelligence and machine learning expertise widely accessible to developers to create satisfying experiences for end users, said Andrew Shuman, corporate vice president of products for Microsoft’s AI and research organization, in a statement.

According to Microsoft, more than 424,000 developers across 60 countries have already tried Cognitive Services. The company now offers 25 Cognitive Services, up from just four nearly two years ago.

Moving forward, Microsoft researchers are working on ways to ensure that speech recognition works well in more real-life settings, in which the technology recognizes individual speakers when multiple people are talking. A longer-term goal is to teach computers not just to transcribe the words that come out of people’s mouths but to answer questions or take action based on what they are told.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The 2017 Speech Industry Star Performers: Microsoft

Triton Digital Partners with ekoz.ai on Voice-Cloned Podcast Ads

Soul App Launches Full-Duplex Voice Model

Mistral Unveils Voxtral Open-Source AI Voice Model

Leena AI Launches Agentic AI Colleagues