Spanning decades, AT&T's Watson engine continues to evolve and entice developers.
"Mr Watson, come here, I want you."
Alexander Graham Bell spoke those famous words to his assistant, Thomas A. Watson, during Bell's first successful use of his invention, the telephone, in 1876. At that moment, the speech technology industry was born. And despite the industry's significant growth since then, Thomas A. Watson's early contribution to speech technology is still honored today with AT&T's Watson technology, a speech and natural language platform that has been decades in the making.
At a basic level, AT&T's Watson is a speech and natural language engine that processes and analyzes speech input, performs at least one service, and returns a result in real time.
AT&T's Watson is frequently confused with IBM's Watson, an artificial intelligence-based computer system that is able to answer questions asked in natural language. "IBM's Watson is more of a question-and-answer application, whereas AT&T's Watson is a platform," Mazin Gilbert, assistant vice president of technical research at AT&T Labs, says.
"The way we have architected AT&T Watson is that we realized that to build an application or a service, which is the ultimate goal here, there are many different technologies that need to be working synchronously. This is how you get a seamless, ubiquitous experience," Gilbert adds.
To that end, AT&T's Watson can combine speech with other modalities, such as touch screens (e.g., Find the closest Starbucks), text, facial recognition, audio files, and gestures. It can also leverage various technologies, such as natural language understanding, automatic speech recognition, text-to-speech, dialogue management, and voice biometrics.
AT&T Launches Watson APIs
While AT&T's Watson has been used in IVRs for more than 20 years, it wasn't until July 2012 that the company launched a speech API via its developer program, which offered APIs for its Watson technology. The APIs enable developers to create apps and services with voice recognition and transcription capabilities, and include an open, generic API, "a sort of holy grail" of being able to transcribe speech to text, Gilbert says. "That API is trained on a million-plus words and hundreds of thousands of speakers, and that's available to developers who want to do speech recognition and don't have a clear notion of what application they need," he says.
With these APIs, developers don't have to be specifically skilled in creating speech apps; they can send AT&T audio and it can return text of what an end user has said. "We're doing this so people don't have to reinvent the wheel," Gilbert said in an earlier interview with Speech Technology.
The AT&T Developer Platform provides access to software developer kits and code samples for several environments, such as Microsoft Visual Studio, and is compatible across a number of mobile platforms.
"The whole world of mobile development is so complicated now because of all the different options," says Deborah Dahl, principal at Conversational Technologies, chair of the Multimodal Coordination Group, and cochair of the Hypertext Coordination Group at the World Wide Web Consortium.
Dahl offers that when it comes to mobile intelligent virtual assistants, the vocabulary in Nuance's Nina is the most customizable application. However, she maintains that AT&T's Watson offers a variety of contexts. Currently, there are nine speech contexts available: gaming, business search, social media, TV, Web search, general purpose or generic, voicemail to text, SMS, and question and answer.
As an example, the business search context is trained on tens of millions of local business entries, and lets users transcribe search queries. The question-and-answer context is trained on 10 million questions and enables users to transcribe questions and have the correct answer returned. "AT&T's option is a good balance between price, ease of use, and flexibility as far as customizing it for your own application," Dahl says. "The things that stand out to me are the levels of customization and the number of environments.
"The combination of these modalities is key to providing a ubiquitous experience because, given these different environments, such as sitting in your car, you want to have a very different experience than using it at home," Dahl says. "In some cases, more than one modality is required to fulfill an action. I'm not aware of a platform that brings all this rich technology into a single framework."
The API program has proved to be popular; currently there are 43,000 developers using the platform. AT&T offers a free 90-day trial period and a yearly subscription for $99. Already, AT&T uses Watson in several different use cases, such as for its home automation and security product, AT&T Digital Life; AT&T U-verse Easy Remote for television; and for speech-connected cars with partners GM and QNX Software Systems.
With QNX, Watson's speech engine analyzes words spoken by a driver and fits them into known patterns. What has been said is then routed from the cloud to the car. The in-vehicle intent engine from QNX performs the rest of the speech analysis to figure out how to act.
"Sharing the workload across client and server offers automotive manufacturers and end users the best of both worlds," said Andy Gryc, automotive product marketing manager at QNX Software Systems, in a statement."The server-side analysis, provided by AT&T Watson, is optimized for complex scenarios, such as a navigation application in which the driver