Microsoft Advances Speech APIs

Microsoft earlier today unveiled a speech-related application programming interface (API) for public preview, with general availability due next month.

As part of the move, Microsoft brought Custom Speech Service from private preview with select developers to public preview with anyone who wants to trial it, while Content Moderator and Bing Speech API will advance from public preview to general availability next month. Content Moderator and Bing Speech API had been in public preview for a couple of years, according to Xuedong Huang, chief speech scientist and technical fellow of the Microsoft AI and research group. All three all are part of Microsoft's cognitive services portfolio of artificial intelligence.

"Microsoft is interested in moving technology into the hands of our customers; speech recognition is part of that pursuit," Huang says. "It's very easy to use. We already have 400,000 developers working with it, and that number keeps growing."

According to Microsoft, more than 424,000 developers across 60 countries have already tried Cognitive Services. The company now offers 25 Cognitive Services, up from just four nearly two years ago.

These tools will enable allow developers to add the same machine intelligence that powers Microsoft's Skype Translator, Bing search engine, and Cortana virtual assistant into third-party applications that people use every day.

"Cognitive Services is about taking all of the machine learning and AI smarts that we have in this company and exposing them to developers through easy-to-use APIs so that they don't have to invent the technology themselves," said Mike Seltzer, a principal researcher in the Speech and Dialog Research Group at Microsoft's research lab in Redmond, Wash., in a prepared statement. "In most cases, it takes a ton of time, a ton of data, a ton of expertise, and a ton of compute to build a state-of-the-art machine-learned model."

Microsoft has been working for more than a decade to develop speech recognition technology to perform robustly in noisy environments as well as with the jargons, dialects, and accents of specific user groups and settings, according to the company. This technology is now available to developers of third-party applications through the Custom Speech Service.

The Content Moderator allows users to quarantine and review data, such as images, text, or videos to filter out unwanted material, such as potentially offensive language or pictures. The Bing Speech API converts audio into text, understands intent, and converts text back to speech.

The entire collection of Cognitive Services stems from a drive within Microsoft to make its artificial intelligence and machine learning expertise widely accessible to developers to create delightful and empowering experiences for end users, said Andrew Shuman, corporate vice president of products for Microsoft's AI and research organization, in a prepared statement.

"Being able to have software now that observes people, listens, reacts, and is knowledgeable about the physical world around them provides an excellent breakthrough in terms of making interfaces more human, more natural, more easy to understand, and thus far more impactful in lots of different scenarios."

Huang said that business intelligence company Prism Skylabs used the Computer Vision API in its Prism Vision application, which helps organizations search through closed-circuit and security camera footage for specific events, items, and people.Another early user was VR game developer Human Interact.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues