Watson and Siri are only the beginning.
The most familiar application of natural language understanding (NLU) in the speech industry today is call routing applications in call centers. Natural language processing techniques allow callers to speak to IVRs in their own words, without having to phrase their requests in the limited ways the system is expecting. These systems are extremely robust because, when the goal is to sort natural spoken requests into categories such as billing, technical support, or sales, only the words relevant to the categories need to be recognized and understood. During the development process for these systems, statistical analysis of thousands of questions from callers reveals the words and combinations of words most likely to be associated with each category. Callers' questions are matched against the training data to identify the best matching response. Typically, these systems are very effective and users are very satisfied with them. But we are now seeing a number of ways that will allow NLU to soon go well beyond statistical call routing.
Last year saw the arrival of two groundbreaking systems. If you watched IBM's Watson system playing (and winning) Jeopardy on TV against human competitors, or have seen Apple's Siri personal assistant, you know that there are some very exciting things going on with NLU technology.
How do these systems work, and how will technologies like these eventually play a role in speech systems? Let's look at Watson first. The technology used in Watson has made advances in two directions—processing the language itself, and connecting the results of language processing with huge amounts of knowledge. Watson's techniques for processing natural language involve detailed analysis of the syntax and semantics of each question, and comparing the result against its petabytes of information about the world. IBM is currently working on commercializing Watson for applications that require understanding huge amounts of data and synthesizing appropriate answers. Applications could be based on, for example, medical data or, on a smaller scale, troubleshooting information for consumer products. Nuance and IBM have announced a research project looking at commercializing Watson for projects in the healthcare industry. Currently, Watson requires a massive computational infrastructure, but its computing requirements will decrease dramatically with time.
Apple's Siri, available on the iPhone 4S, also has impressive natural language abilities. Siri is an intelligent personal assistant that lets you use your voice to do things like add appointments to your schedule, make calls, and send text messages. Siri's speech recognition is very accurate, and its ability to understand what it hears is quite good. But it's Siri's connection to the world that is especially impressive. It excels at location-based services like local business search, and is well integrated with the Wolfram Alpha computational knowledge engine, so that it can answer general questions like "Who was the fourth president of the United States?" or "What is the cube root of 437?" When Siri first appeared, developers wanted to know if there was an API available that they could use to add Siri-like functionality to their own apps. So far, nothing has been announced, although there are rumors one may be in the works. Siri, or a Siri-like system, would be a perfect fit in customer service applications because of its multimodal capabilities, ability to accept flexible natural language inputs, and integration with other information.
Other intriguing applications of natural language understanding are emerging as well. Sentiment analysis looks at data on the Web, like customer reviews, and classifies customer opinions about products. This could be harder than it looks. Reviews like "This product is a waste of money" obviously convey a strong negative sentiment. But the sentiments are not always as easy for a machine to spot. For example, "This product has a beautiful design and an incredible price. I was very happy with it." sounds like a positive review. Until you read the next sentence: "But it set my house on fire." The machine has to know that setting the house on fire cancels out the product's positive features. Current research is aimed at accurately identifying these more indirect sentiments.
Siri, Watson, and other advanced natural language technologies are paving the way for much more capable and natural applications. I believe we will start to see technologies like these applied to speech applications in call centers, in the enterprise, and other applications, in the very near future.
Deborah Dahl, Ph.D., is principal at speech and language consulting firm Conversational Technologies, a member of the board of directors of AVIOS, and chair of the World Wide Web Consortium's Multimodal Interaction Working Group. She can be reached at firstname.lastname@example.org.