ICSI Research Building Speech and Recognition Systems for New Languages

The International Computer Science Institute (ICSI) is leading a research team under the Intelligence Advanced Research Projects Activity (IARPA) Babel program focused on building speech recognition solutions with self-imposed time and data limitations for a variety of languages.

The work aims to better understand fundamental challenges and discover new methods for development of speech models for languages that could emerge as important in the future.

"The goal of the Babel program is to rapidly build speech recognition systems to support effective keyword search for new languages using limited amounts of transcribed speech recorded in real-world conditions," said Mary Harper, the IARPA manager in charge of the program, in a statement.

Using only a fraction of the training data usually required, the team aims to build speech recognition systems for several languages in just one week by the end of the program.

"ICSI excels at intellectual challenges and unique approaches to research. This is an intriguing project that puts significant constraints on our researchers as a means to discover better ways to develop automatic speech recognition systems," said Roberto Pieraccini, director and president of ICSI, in a statement.

By working on a variety of languages with time and data restrictions, the team will research basic principles of speech technology rather than incremental improvements to existing technology. In addition, this research will be useful in enabling keyword-search systems for those languages that do not have large amounts of transcribed audio.

"The speech recognition systems we've built in the past have the curse of being reasonably good, particularly for a few languages and speech recorded in good acoustic conditions, which has often reduced the impetus to significantly change the technology," said Nelson Morgan, deputy director and leader of the Speech Group at ICSI, in a statement. "This project strongly pushes us to solve fundamental problems in speech recognition to address the Babel challenge."

In each of the four periods of the project, the team will be given a set of languages and be tasked with developing methods to quickly build a system. Speech recognition systems are typically trained on thousands of hours of transcribed audio. In this project, the team was initially given only 80 hours of conversational speech for each language, and in each succeeding period, a smaller fraction of the audio is transcribed. At the end of each period, the team will be given a new language to build a system, initially in four weeks, but by the end of the program, down to just one week.

The project is funded by IARPA, a research arm of the Office of the Director of National Intelligence, which invests in high-risk/high-payoff research programs.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues