AT&T is readying a June release of several of its Watson Speech application programming interfaces (APIs) that developers can access to quickly create new apps and services with voice recognition and transcription capabilities.
"The whole idea is to allow developers to build applications in speech where there's minimal or no knowledge of speech.This becomes just a simple API where all the intelligence and professional services are sitting in our cloud," says Mazin Gilbert, assistant vice president of Technical Research at AT&T Labs, and who is responsible for all of AT&T's speech research. "These are APIs that have been trained on very large amounts of data for specific applications."
The first set of APIs will focus on seven areas: Web search, local business search, question and answer, voicemail to text, SMS, U-verse electronic programming guide, and dictation for general use of speech recognition.
"This includes an open speech or generic API, and that is sort of the holy grail of being able to transcribe speech into text," Gilbert says. "That API is trained on a million-plus words and hundreds of thousands of speakers, and that's available to developers who want to do speech recognition and don't have a clear notion of what application they need."
In addition, AT&T is also giving developers a software development kit, which they can use to create software to capture a user's spoken words and send them into the network for transcription.
"We're providing the software that goes into your application, and this software basically talks to our API that sends speech in real time and is able to recognize it," Gilbert says. "Some developers want to build their own APIs, they want to specialize in their platform; some of them don't have that expertise and they want to pull their software into their application. We're doing this so people don't have to reinvent the wheel."
Initially, the APIs will be available for Androids and iOS, with more AT&T Watson Speech APIs coming for areas such as gaming, social media, speaker authentication, and language translation. "We will be having other APIs coming out this year that allow you to do a lot more custom type of capabilities that you can't do today," Gilbert says.
As far as development of AT&T Watson Speech-based APIs, Gilbert points out that the company has made significant investments in research and development, with over 600 patents, but that it still welcomes input.
"We're trying to open this up to others," he says. "The whole idea is that there are a lot of smart people out there who can take incredible applications and services that we could never figure out ourselves. We're trying to energize that community to allow them to use our APIs, speech, and language technologies without having to make any investment at all."