Speech Technology Magazine

 

The 2015 Speech Industry Star Performers: IBM

By Oren Smilansky - Posted Aug 7, 2015
Page1 of 1
Bookmark and Share

IBM Welcomes Developers for Watson to Advance Cognitive Computing

IBM stands out as a key performer this year for progress it made in cognitive computing, which followed on the heels of the announcement in February that it would welcome developers to its speech-to-text and text-to-speech technologies.

In July, the company further opened the Watson Developer Cloud, which boasts 280 partners and tens of thousands of developers, to encourage advancements in cognitive computing, and added several noteworthy new functions that promise to improve speech technology.

Among these is the speech-to-text service, which sharpens the precision with which systems can translate conversational speech to text. The service includes an application programming interface (API) that allows users to add speech transcription capabilities to various mobile and Web applications. Users can access the tool to transcribe streamed audio using a microphone or uploaded digital files with recorded audio content. The system relies on machine intelligence to apply information about language structure and grammar rules, which generates a more accurate transcription than was previously possible. According to a blog post from Michael Picheny, senior manager of Watson Multimodal at the Thomas J. Watson Research Center, the "breakthrough" advances will enable "a system capable of very low error rates." A hypothetical use for this feature would be transcribing customer service calls to learn more about caller sentiment.

Similarly, the company introduced reverse capabilities with its text-to-speech service. Equipped with an API that draws on IBM's speech-synthesis tool, the service converts text to audio that strives to sound more convincingly human. Using a representational state transfer (REST) interface, the system enables users to translate written text into conversational speech in a number of voices (male or female), languages, and accents ranging from English to Italian, German, and Spanish. The tool makes it possible for applications to communicate in a more natural-sounding voice "complete with appropriate cadence and intonation," according to the IBM Watson Web site. Aside from the obvious use cases (it will no doubt prove useful to the vision-impaired, for instance, or to those who don't read well in a certain language), it aims to make automated voices easier on the ears. The company suggests this will become more important as the Internet of Things (IoT) ramps up and people begin to interact with their devices more frequently than ever before. (And considering that IBM announced in March that it plans to invest $3 billion in IoT, it's probably safe to take the company at its word.)

IBM Watson's Language Translation service—originally referred to as Machine Translation in its first iteration—can instantly identify the original language in which a text input was composed. It is unique in that it can translate across multiple domains, including "news, patents, or conversational documents," according to the IBM Watson Developer Cloud Web site. Using this service, a help desk representative who speaks English can communicate with someone who is speaking another language during a direct chat session. Community managers and Webmasters likewise can achieve better communication on forums, as the service will allow them to publish content so that users from one country can read messages posted by citizens of other countries and seek advice regarding a multitude of topics.

While the customer service benefits of these tools are yet to be determined, they'll likely have a big impact. "Our long-term goal is to build a system that can learn over time," Picheny wrote in the blog post. 

Page1 of 1