IBM Opens Speech and Cognitive Computing APIs to Watson

Article Featured Image

IBM is courting developers in a big way. In February, the company lifted the veil from its speech technology solutions, inviting Watson developers to use its speech-to-text and text-to-speech technology. In March, Big Blue acquired AlchemyAPI, which provides cognitive computing application program interface (API) services and deep learning technology.

Michael Picheny, senior manager of the speech and language algorithms group at IBM’s T.J. Watson Research Center, explained in a blog, "With speech services, you can build a spoken conversational interface to Watson, or produce transcripts from running speech that can be processed by other Watson services, such as machine translation, question and answer, and relationship extraction."

Vince Padua, director of product management and strategy at IBM Watson, says opening the door to developers made sense, given the company’s long history of working with speech technology. It dates back to the 1960s with a project called the Shoebox machine, largely hailed as the forerunner of modern voice recognition systems. Shoebox was an experimental machine that was capable of responding to basic voice commands to perform math calculations.

"Speech is being seen as a critical element for a lot of applications that people are talking to us about, certainly in the world of mobile and social," Padua says.

The sharing of the technology also is part of the reasoning behind the AlchemyAPI acquisition. Another tie-in is the IBM Watson developer cloud, which he notes was made publicly available in October with a series of data-level APIs for processes such as concept expansion, message resonance, and text-based natural language processing.

The intent of the developer cloud, Padua says, is to essentially expand the availability of the same Watson assets and technology that appeared on the TV game show Jeopardy! in 2011 and now serve as the technology behind products such as IBM Engagement Advisor. "It's to bring those directly into the hands of developers for them to access Watson and build their own applications," he says.

Padua says AlchemyAPI's deep learning platform uses very similar techniques to IBM, but not all of the algorithms are the same. Another missing piece is that AlchemyAPI didn't have speech capabilities, which IBM is bringing to the table.

"AlchemyAPI is a great addition to our portfolio," Padua says. "It accelerates our services that do things like language, NLP [natural language processing], and sentiment extraction, and it brought in new technology that IBM was working on in beta form, such as visual and image recognition."

Padua explains that AlchemyAPI works through deep learning approaches and convolutional neural networks, which, in machine learning, are able to respond to overlapping, multilayer regions in a visual field. Convolutional neural networks are widely used models for image and video recognition.

"Convolutional is probably the most adopted neural network by most people," he says. "If you look at the stuff that we're doing around our speech APIs, we use a very similar technique."

"There are elements of statistical learning models that are employed," Padua says. "Any of the technologies that we're working on requires data. The approach that AlchemyAPI uses is through convolutional neural networks to bring language and images together and provide a more rich sense of what all the information is. Words can provide more insight into the images, and the description of the images can provide more context, more modeling of what the words are themselves."

Many use cases from thousands of Watson cloud developers and AlchemyAPI developers are consumer-oriented, for mobile and social use, Padua says, and convolutional neural networks work "very, very well" with them.

AlchemyAPIs process "billions" of API calls every month in 36 countries and eight languages: English, French, German, Italian, Portuguese, Russian, Spanish, and Swedish.

The idea of using Watson APIs to expand languages, the types of users, and age demographics is something Padua expects Watson developers to take advantage of. "You can have a speech-to-text or text-to-speech ability that is better suited for kids, for example, or teenagers or the elderly, and in multiple languages and accents," he says. "Those are all things … that we absolutely expect folks to do with our APIs.

"Since IBM has a deep background in the world of enterprise, we see this being used in a lot of different places, such as analysis of call center recordings, interactions of embedded devices, or the Internet of Things," Padua adds.

In late March, IBM announced plans to invest $3 billion over the next four years to create an Internet of Things business.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

IBM and SoftBank Introduce Japanese Watson APIs

Watson's capabilities can now be used to develop cognitive computing and speech services in Japan.