The Next "Great Leap Forward"

Voice has long been an integral part of education and societal growth. About 40,000 years ago, Homo species underwent what Jared Diamond characterized as the "Great Leap Forward," when human culture changed at an accelerated rate. Diamond, a professor of geography at UCLA and Pulitzer Prize-winning author of Guns, Germs, and Steel: The Fates of Human Societies, maintains this "leap" coincided with the spread of human language. In time, the development of writing systems enabled communication with people in different places and times. Today, however, advancements in speech technology have given voice to the written language, which has the potential to pave the way for another "leap" in education.

While speech in education is mainly used in areas that are cheap and easy to automate (signing up for courses, reporting grades, and fee payment), it is also extending its reach into teaching and learning. Nonprofits, such as Bookshare.org as well as major corporations, such as Apple, IBM, and Microsoft, use speech technology to make content more accessible. People can take advantage of these technologies while multitasking. For example, students may listen to a biology text while driving to class, or input data descriptions while examining specimens in a lab or in the field.

Just because information is accessible doesn't mean it is easy to comprehend, especially when dealing with foreign languages. However, speech input/output technologies, when combined with translation and data mining technologies, can break down global linguistic barriers by translating content in foreign languages.

Speech technology is also helping those who have communication difficulties. Speech input and output can help the visually impaired, dyslexic, speech impaired, and limited-literacy members of our population. For example, text-to-speech synthesis can read content to the blind and dyslexic. For dyslexics, seeing the text while they listen is particularly important. Use of synthesis—as opposed to natural voices—is important so that all content can be made available to those who need it.

Educational applications where speech is already deployed and profitable include assessment over the telephone of spoken language (e.g., Ordinate, now a part of Harcourt Assessment), speech input and output as a component of more general language learning activities (e.g., SRI's DynaSpeak tools, NeoSpeech, DynEd and many other language education companies), and assessment of reading fluency (e.g., Soliloquy Learning). Other companies, such as Utopy, are deploying speech-mining technology that can make speech searchable in ways previously reserved for written documents. There are many more applications of speech technology in small, but growing markets, and in the laboratory prototype stage.

A session called, "Speech and Language in Educational Applications," at Interspeech2006.org will take place in Pittsburgh, Pa. on September 17 - 21, 2006. A range of demonstrations for K- 12 through college-level applications will include speech in automatic assessment of children's oral reading, a prototype for learning Japanese, a conversational agent for tutoring speech and language learning, and an intelligent tutoring system for lexical practice and reading comprehension. One goal of the session is to inform researchers of commercial efforts and real-world conditions, since too often researchers tend to squeeze out a few more percentage points on core technology while real-world use might be dominated by factors that could make such improvements of little importance.

Observation of deployments can inspire new research directions that too often are missed by companies because of a need to focus very near term, and missed by researchers who may never get beyond a prototype stage. An equally important goal of the session is to inform commercial efforts and educational representatives of research directions that might eventually be deployed.

With today's speech technology capabilities and the increased awareness of speech in education we may be on the verge of another "great leap forward." This time speech input and output can play key roles in making all content, spoken and written, in all languages available and accessible in ways that are easy to retrieve when we need it. Whether human culture changes as a result, remains to be seen.

Patti Price has over 20 years experience in developing and transferring speech and language technology, including directing the Speech Technologyand Research Lab at SRI International and co-founding Nuance Communications, BravoBrava! and Soliloquy Learning. Her formal education includes postdoctoral training in electrical engineering at MIT, a PhD in linguistics from the University of Pennsylvania and a degree in French literature from the University of Poitiers. She is presently a consultant specializing in speech technology for educational applications, PPRICE.com.

The Next "Great Leap Forward"

DentScribe Launches DentScribe Perio Charting 3.0

Krisp Launches Voice Translation v3

Treble Technologies and Hugging Face Benchmark ASR Models

Why Better Client Tracking Starts With Better Capture of Spoken Clinical Interactions