The 2017 Speech Industry Star Performers: IBM

Article Featured Image

IBM Raises the Speech Accuracy Bar

When it comes to speech solutions, accuracy has long been the Holy Grail. Companies have strived to re-create near-human-level understanding, if not surpass it. No company operating in the speech arena has come closer to that standard than IBM. The company recently made the bold announcement that its Switchboard speech recognition system has reached what it is calling a new industry record error rate of just 5.5 percent, lower than the 6.9 percent achieved a year ago and near the 5.1 percent error rate of humans. The company credits improvements in deep learning and computer processing for the increased speech capability.

“The important thing is that these [artificial intelligence] technologies are producing increasing improvement in cognitive learning and speech recognition,” says Michael Picheny, senior manager of IBM Watson Multimodal. “It used to take many years for these types of improvements, but deep learning is very powerful.”

“The ability to recognize speech as well as humans do is a continuing challenge, since human speech, especially during spontaneous conversation, is extremely complex,” added Julia Hirschberg, professor and chair of the Department of Computer Science at New York’s Columbia University, in a statement. “When we compare automatic recognition to human performance, it’s extremely important to take both these things into account: the performance of the recognizer and the way human performance on the same speech is estimated. This scientific achievement is in its way as impressive as the performance of [IBM’s] current ASR technology and shows that we still have a way to go for machines to match human speech understanding.”

IBM will not be deterred, though, as it continues to develop the speech capabilities connected to its Watson cognitive computing platform. The company last fall rolled out the Watson Virtual Agent, a cognitive conversational technology that allows businesses to simply build and deploy conversational agents.

Watson Virtual Agent allows users to build and train engagement bots from the cloud, harnessing the power of cognitive technologies and deep natural language processing capabilities that can be used to assist consumers.

Watson Virtual Agent joins existing Watson services, such as Watson Tone Analyzer, Watson Speech to Text, and Watson Text to Speech, offering users a full suite of cognitive capabilities to build conversational agents and other solutions.

So it’s no surprise, then, that IBM’s Watson’s technology is growing in use. Recently, Harman Professional Solutions and IBM Watson Internet of Things (IoT) unveiled Voice-Enabled Cognitive Rooms for medical facilities, corporate offices, hotels, cruise ships, and other hospitality environments. The initiative is under way at IBM’s Watson IoT Global headquarters in Munich, Germany, and the first Voice-Enabled Cognitive Rooms for enterprise applications are expected this year.

IBM’s cognitive technologies are embedded into Harman soundbars and alarm clocks with which a user can interact using natural language, asking questions or issuing voice commands. These requests are then sent to the Watson cloud and Watson IoT Services to allow the user to control the in-room subsystems. The Cognitive Rooms system gets smarter about the person and his preferences over time. The system is activated by a customizable wake-up word.

The Voice-Enabled Cognitive Rooms also function as in-room concierges that can answer general or site-specific questions. Users can even employ Watson for service requests, including amenity replenishments, restaurant reservations, late checkout, room service, shuttle service, and more.

“Our platform continues to evolve as we tap IBM’s science and research capabilities to enhance our services,” said David Kenny, general manager of IBM Watson, in a statement. “IBM is advancing the technologies on the platform in the area of conversation, all with a mission of helping brands transform their customer experience, empowering users to create solutions that deepen engagement and facilitate stronger, more positive interactions.”

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues