May 1, 2009
By Judith Markowitz Principal - J. Markowitz, Consultants
Forward Thinking

Speech for Distance Learning

Distance learning (DL) involves the delivery of educational services to students who are not physically present at an educational institution via a range of instructional materials, including CDs, handheld devices, television, and online coursework. One of the technologies frequently listed as important for DL is speech recognition (SR). Speech is touted for DL to provide training in singing, literacy enhancement, real-time transcription of televised lectures, exercises for aphasic patients, and as an assistive tool.

Many of these uses are primarily research. This is not the case for the use of SR in foreign-language teaching. In DL, where there is no human tutor, the value of SR for learning to speak a foreign language is obvious. The following two products illustrate how SR is configured for DL of foreign languages:

EduSpeak

EduSpeak is a software development kit (SDK). It was developed by SRI International, which has been working on SR for language learning for more than 20 years. The SDK can be used to create classroom-support modules as well as DL products. In addition to tools and functions for integrating SR into language-teaching software or services, the SDK contains SR language models for the target language and a pronunciation scoring tool (PST).

Two language models are created for each target language. Both are statistical and constructed from a corpus of samples provided by a multitude of speakers. Both require approximately two months of development for a new language, but can be accelerated by using SRI’s adaptation Web service. The native-speaker model is identical to speaker-independent language models created for other SR applications. The second model is built from the speech of non-native speakers exhibiting varying levels of pronunciation proficiency in the target language. The second model is necessary for accurately recognizing the speech of the language learner, especially that of students with strong accents.

The non-native speech corpus is also used to construct the PST. The PST evaluates the “nativeness” of a student’s pronunciation at the utterance and phoneme levels. This is essential for pronunciation learning in a DL context. Each sample in the non-native corpus is evaluated by a team of trained native speakers. Their judgments are combined with the phonetic descriptions of the samples to produce the set of evaluation metrics that form the PST.

Developers use the language models, PST, grammar builders, and other tools to create DL and other products. At runtime, the SR component of the system is run on the student’s desktop or client machine.

Tell Me More

Tell Me More is a line of packaged and online software for learning foreign languages. It is produced by Auralog.

Auralog was the first company to offer commercial language-learning software with embedded SR. From the start, the company’s goal was to ensure that language learning included pronunciation alongside reading, writing, and listening. These skills and cultural information are combined into a language-teaching structure.

Auralog does not use its own SR but developed its own PST based on pitch, intonation, and word-level pronunciation. The PST assigns a rating from 1 to 7 to a student’s utterance based on SR, and highlights problem areas, such as mispronounced words. Pronunciation teaching includes phonetic exercises for specific problematic phonemes that use graphics (such as a cross-section of the mouth showing how a phoneme sequence should be pronounced) and SR-based feedback from a spoken error tracking system. These PST tools point students to problem areas and let them know when they can move to the next level. Performance requirements become stricter as the student advances to higher levels of difficulty.

SR is particularly important for learning tone languages, such as Chinese. Chinese has four distinct tones. This means that a string of phonemes can be different Chinese words depending on the tones used to say them. Without SR, effective DL of Chinese would be extremely difficult for anyone hoping to speak, read, and write the language.

In Tell Me More, SR is one component of a pedagogical system that is designed to systematically move students toward more native-like language behavior.

Judith Markowitz, Ph.D., is president of J. Markowitz Consultants and a leading independent analyst in the speech and voice biometrics field. She can be reached at judith@jmarkowitz.com.

Speech for Distance Learning

Eltropy Expands Voice Authentication Ecosystem with Illuma, IDgo, and Pindrop

Modulate Expands Velma with Voice-Native Real-Time Conversation Intelligence

Corti Launches Symphony for Speech-to-Text

Why Voice AI’s Next Big Challenge Isn’t Accuracy. It’s Relationship Design.