Innovative Research in the Labs Part IV - Carnegie Mellon University
Taking a more academically-oriented approach to research, this month I reviewed the speech groups at Carnegie Mellon University (CMU), in Pittsburgh, Pa. Centered in the Language Technologies Institute (LTI) in the School of Computer Science are two large research groups, the Speech Group and the international center for Advanced Communications Technologies (interACT). LTI conducts research and provides graduate education for all aspects of language technology and information management.
Speech research at CMU has produced some of the most fundamental and groundbreaking advances in speech technologies starting back in the 1970s with the first demonstrations of large-vocabulary speech recognition systems. By the late 1980s attention shifted to speech technology based on hidden Markov models, producing the well-known Sphinx and JANUS systems, and continued with the first demonstration of spoken language technology with the Phoenix system in the early 1990s and the first simultaneous lecture translation system in 2005. Throughout the 1990s and continuing today, attention has shifted, at least in part, to application development, including multilingual conversational systems, speech-to-speech translation, and multimodal interactive systems.
Recently, groundbreaking research at interACT has been presented in a press conference spanning two continents: the simultaneous translation of talks and lectures and the recognition of silently spoken speech by simply relying on the movement of the facial muscles www.is.cs.cmu.edu. In addition to research, a wealth of talent has also emerged from CMU, helping to create and mold the speech industry. Because many research results are open source, the entire industry can and has benefited from CMU's research and development efforts.
Carnegie Mellon as a university has a mission that differs somewhat from the more commercially-oriented research labs we have reviewed in the past. CMU has the dual mission of creating great research and of educating the next generation of researchers. As part of its educational mission, CMU researchers are hosting and organizing (with colleagues from the University of Pittsburgh and local companies) the Interspeech 2006 meeting, the ninth International Conference on Spoken Language Processing (www.interspeech2006.org).
Although both research groups focus on technology and applications in speech and language, each has specialized areas of focus. The CMU Speech Group has continued to develop and improve the CMU Sphinx speech recognition system, which is widely distributed in open source form. Work on robust speech recognition for difficult acoustical environments (such as background noise and reverberation) is continuing and presently focuses on multisensor processing and signal processing motivated by the structure and function of the human system. Other research includes spoken dialog management architectures for complex problem solving tasks, such as the "Let's Go" project, designed to provide bus schedule information for all populations, and TeamTalk, a multimodal system for F18 aircraft maintenance personnel. Major educational projects include Fluency, a system to facilitate foreign language pronunciation, and Project LISTEN, a system to improve literacy training in children. Additionally, colleagues in the Department of Electrical and Computer Engineering are working to develop a silicon VLSI chip that implements the Sphinx decoding algorithms. InterACT is a joint center between the University Karlsruhe in Germany and CMU's Language Technologies Institute.
InterACT is hosted respectively by the Fakultat fur Informatik in Germany and LTI in the U.S. The interACT center has a dual goal of research and education. The academic mission is to train students, faculty and staff to operate in international research teams across multinational and multicultural boundaries. InterACT runs a highly successful international exchange program, along with workshops and seminars. The research mission covers a wide area, such as research on multilingual speech processing, multimodality, pervasive computing, human-to-computer and human-to-human communication. This includes research in handwriting recognition, lip reading, gaze tracking, and sign translation. InterACT also conducts speech research in multilingual speech recognition; language identification; speaker recognition; automatic translation of signs, text, and speech; speech summarization; discourse analysis and speech understanding; speech recognition based on electromyography and electroencephalography; and even attempts recognition and communication with dolphins.
There is a lot going on at CMU. If you are interested in cutting edge and even fun research, check out the projects on the CMU Web page at www.speech.cs.cmu.edu and www.is.cs.cmu.edu.
Have any innovative research news from R&D? Please email me.
Nancy Jamison is the principal analyst at Jamison Consulting. She can be reached at firstname.lastname@example.org.