Researchers at the International Computer Science Institute (ICSI) in Berkeley, Calif., are working with Microsoft to advance the state of the art in human-computer interaction relying on speech and other modalities.
"[This] creates a win-win situation," says Roberto Pieraccini, director of ICSI. "Both Microsoft and ICSI have among the best researchers in speech, so [this] collaboration…has the potential to produce substantial advancements in the field. Microsoft researchers are working with a focus on commercial realizations, while ICSI researchers follow a more academic and curiosity-driven approach. Thus, in this collaboration, ICSI has a unique opportunity to work on specific problems and real data, while Microsoft can benefit [from] the ICSI long-term research vision."
Researchers will use information conveyed by the melody and rhythm of speech, known as prosody, to improve automatic speech understanding.
"One of the first projects is that of understanding how to use speech prosody—the intonation always present in natural speech—to extract information which can be used to identify the intention and the emotional state of the speaker, Pieraccini says. "While there have been several attempts in the past to use prosodic information at the level of speech recognition, we have not been able to use [it] effectively. In this study, we are trying to push those attempts further.
"This work is particularly important now, as the popularity of devices that understand and produce speech grows more quickly than ever before," he adds.
Elizabeth Shriberg and Andreas Stolcke, principal scientists at Microsoft's Conversational Systems Laboratory (CSL) and ICSI external fellows, will lead the effort. CSL, an applied research group within Microsoft's Online Services Division based at the Microsoft campus in Sunnyvale, Calif., is exploring novel ways to interact naturally with computer systems and services using speech, natural language text, and gesture. Its aim is to enable conversational understanding of users' inputs and intentions across a variety of devices, from mobile phones to Xbox consoles in the living room. CSL conducts research spanning a range of scientific disciplines, from acoustic to semantic and affective language processing.
"Patterns of timing and intonation in spoken language encode information far beyond that conveyed by words alone. This information is important for achieving natural and efficient conversational interactions with machines," Shriberg said in a statement. "We expect to accelerate progress on human-computer dialogue systems that better understand and use cues in human-human spoken communication that we often take for granted."
The collaboration takes advantage of ICSI's history of excellence in speech processing research and Microsoft's wealth of data, technology, and experience in deploying natural speech interfaces in its services and applications, and could potentially extend beyond the prosodic aspects of speech to additional research topics in human-computer interaction.