Watson: Much Ado About Toronto
Unless you’ve been living under a moss-strewn rock, you’re aware that IBM recently either pulled off a massive marketing stunt or ushered in the next era of computing, depending on the blogosphere pundit in your earshot. For three days in February, IBM’s intelligent speaking supercomputer Watson not only played Jeopardy but also handily beat two celebrated human contestants. Yet a Final Jeopardy mistake on the first day had tongues wagging and fingers pointing, followed by much explanation of the gaffe. Watson responded to a clue about a U.S. airport with the answer “What is Toronto?” Oops. While the surprising error in an otherwise dominating performance has been analyzed ad nauseam (http://www.pbs.org/wgbh/nova/tech/ferrucci-smartest-machine.html), such errors are constants in everyday human interaction and really no big deal.
I’ve been known to argue passionately that miscommunication handling is the most important part of building intelligent speaking applications. Of course, Watson wasn’t designed to converse but simply to answer and select clues, in the highly constrained interactive style of Jeopardy. That may be fine language usage for a TV game show but not so much in the practical world of real interaction.
Most designers know writing error recovery is half, or even three-quarters, of the real work in dialogue design. And, in contrast to our engineering colleagues, we’re pretty cool with errors. They aren’t something to avoid and debug at all costs; they are a natural part of everyday conversation. We don’t gasp and stay up all night if someone gets an error message. Instead, we make sure we correctly anticipated the cause of the breakdown, offered the appropriate message content, and the user successfully advanced the dialogue. Same as when we speak to people.
Research on breakdown reveals a hierarchy of error types that occur in every conversation. The two broad categories of conversational error are input failures and model failures. The former can stem from an inability to hear information (perceptual failure), incorrect or no interpretation of information (lexical failure), or misunderstood meaning (syntactic failure). Model failures are more complex and occur because of a lack of information, different belief systems of interactants, inadequate inferencing, and other cognitive issues.
Theorists suggest that when we recognize a problem during conversation, we assume the lowest levels of the hierarchy, because they require the least cognitive effort and are the easiest to resolve. For example, if we assume a perceptual failure first, the fix is generally to increase vocal intensity, which slows the rate, and repeat the message verbatim. However, higher error types would not be quickly resolved using that method. When we can’t assume an easy cause, we resort to emphasis of specific points, analogy, examples, inference explication, and other explanations to get the conversation back on track to shared meaning. Interestingly, while lower-level failures are generally seen as the responsibility of the speaker to fix independently, higher-level causes are perceived as being the responsibility of both conversational partners to resolve.
Watson’s Toronto breakdown was a model failure because of inadequate background knowledge; a socially skilled human conversational partner would quickly resolve that by expanding his knowledge base. Imagine if Watson was simply speaking to Mr. Trebek, instead of playing Jeopardy:
Watson: Well, I think the answer might be Toronto, but I’m not quite sure.
Trebek: [incredulous] Seriously? Toronto’s in Canada.
Watson: OK, then I think the answer is Chicago.
Trebek: And you’d be correct. (aside, chuckling) Didn’t you guys bother to feed Watson a map or a geography book?
The communicative constraints of the Jeopardy game itself worked against Watson because there is no option for resolving problems through dialogue. Watson had low confidence in his Toronto answer, and, even more importantly, had the correct one as another low-confidence response. In a single turn of conversation, this miscommunication could have been painlessly resolved. Stuff like that happens to us humans all the time.
As advanced speech technology moves from research experiment to practical application in business, its usefulness and usability largely will be determined by whether it can successfully shift from the highly constrained Jeopardy exchange to a more robust interaction that allows for easy resolution of miscommunication. As in other language-based interactions, when miscommunication is viewed as inherent and designed for with an eye to human behavior, errors can be resolved outside the awareness of users, just as they are in our daily interactions. And that isn’t trivial to Watson’s acceptance, in Toronto or anywhere else.
Melanie Polkosky, Ph.D., is a social-cognitive psychologist and speech language pathologist who has researched and designed speech, graphics, and multimedia user experiences for almost 15 years. She is currently a human factors psychologist and senior consultant at IBM. She can be reached at firstname.lastname@example.org.