February 10, 2014
By Sara Basson Program Director, Speech Transcription Strategy - IBM Research
The View from AVIOS

Building Smarter Systems with Cognitive Computing

At the inception of speech technology invention, it was assumed that the technology would create a revolution in user interfaces. It has become clear that speech is an enabling technology, with less perceived value as a stand-alone offering. Its real value emerges from usages that benefit profoundly from the presence of speech.

Adoption has undergone a slower ascent than many of us would have predicted, for a number of reasons.

First, when people extol the value of speech-based applications, they are, for the most part, assuming and including "language understanding" (explicitly or implicitly) along with speech recognition. Speech recognition converts human speech to text; language understanding interprets what has been said. Speech as a stand-alone offering is just not that compelling. Most of the population is highly keyboard- and touchpad-savvy, which limits the advantage of speech over touch to cases where users are situationally unable to look and to touch, as in driving.

Another obstacle is performance. Since the business value of speech as a stand-alone offering has been questionable, speech can only win the battle for heartshare if performance is spectacular: taking less time than text and requiring less error correction or learning time. While speech recognition performance has dramatically advanced, we have not yet crossed that extremely high bar.

Even when speech recognition does perform according to all expectations, there is the thorny problem of understanding language as well as speech, and then the problem of what the system will do in response. As an example, I have been consistently impressed with the performance of Apple's Siri. Granted, over years in this industry, I have learned to "talk" to these systems to maximize success. Even so, my perfectly transcribed speech frequently does not lead to a perfectly delivered solution. More often than not, Siri throws up its virtual arms in despair and routes me to the Web for a search.

Enter the world of smart machines, or "cognitive computing," as it is referred to by IBM. Cognitive computing refers to brain-inspired technologies that address multiple human goals, including seamless device-human interaction. Cognitive systems will be expected to use complex reasoning and interaction to extend human capabilities. They will be able to learn and reason, discover and decide, and interact naturally with humans.

Most technology users would agree that machines do a poor job of modeling human cognition. User engagement is forced to model the "thought processes" of the machine; the machine is not adapting to the unscripted, unstructured methods of human communication. As computers advance to better model human cognition, a plethora of new uses becomes possible. Computers will be able to become virtual, personal assistants.

Once machines become truly intelligent, speech and language interfaces will actually make sense. Dialoguing with intelligent systems will become a preferred mode of interaction. As devices become smaller, they will lose the option of touch or type interfacing, even for highly facile texters. Google Glass already demonstrates this.

There is a key difference between cognitive systems in discussion today and artificial intelligence of the 1980s. In the earlier incarnations of machines designed to "think" like humans, the activities for a particular application needed to be explicitly programmed. This resulted in brittle systems, and also systems that required extraordinary programming talent to move to new domains. In the current vision, the machines won't be programmed by humans to perform specific tasks; they will learn autonomously by interacting with data and humans to perform new tasks. Many research institutions, including IBM, have projects under way to design new architectures for computers, inspired by the human brain.

The key to these systems is not their ability to communicate through speech and language, but that these systems will be intelligent, personalized, and constantly learning—about the environment, about the users, and their preferences and expectations.

Speech and language technologies will not drive the era of cognitive systems, but cognitive systems may well drive the era of speech and language interaction. Perhaps it wasn't enough for speech technology to get smarter and better. The underlying systems needed to be smarter and better, in order to take adequate advantage of speech interfaces. Welcome to the world of smarter machines and cognitive computing. Look forward to a plethora of new services, with speech and language technologies front and center.

Sara H. Basson, Ph.D., is the worldwide program director, research process services, at IBM, and is on the board of directors of the Applied Voice Input/Output Society (AVIOS).