Speech Recognition and Language Learning: A User's Plea

Phillip Britt's article on applications of speech technology in education (June/July) does not mention foreign language learning. As an adult currently learning Spanish, may I make a request to developers to meet a major need?

Adults build up vocabulary mainly by reading. But a major irritation in reading a text in an unfamiliar language is the frequent need to look up words and phrases in a dictionary. Of course, if you are reading on a computer you can already get visual pop-up dictionaries, whose definitions are accessed by highlighting and clicking words. But most reading is still done from plain old paper, which suggests adding speech to the loop.

I would like to see a product, in the form of a laptop or PDA program or special-purpose device or mobile phone service that would generate the definition from spoken input. The definition should be offered in both visual and spoken form.

The student probably pronounces the language incorrectly, so that the training aspect is tricky. A speech recognition module tuned too finely to native speakers would be frustrating for learners; one that trained itself to accept their mistakes uncritically would miss an important learning opportunity. The dictionary should therefore recognize mistaken pronunciations and offer corrections. The correction level should be variable according to the level of the learner.

To do this, the training phase should get its reference phonemes from samples of the learner's speech in his or his native tongue. Most or all of the phonemes of the target language will exist in the mother tongue, so that most mistakes of substitution can be identified. An English learner of Spanish will not encounter unknown phonemes, though the converse is not true. For these cases, you need a sub-module specifically on the strange phonemes (voiced "th" in English, French nasals, Welsh unvoiced "l," etc), to be entered preferably in the presence of a teacher.

As an example of the different levels of mistake, take the Spanish name "Andalucia." A typical English beginner could well say "And-er-LU-z-ya" instead of "And-a-lu-th-EE-ya," making four mistakes:

the stress on "u" rather than "i";
pronouncing "c" as "z" rather than"th" (or New World "s");
pronouncing the initial "a" as in "cat", and
the second "a" as a schwa, rather than as a shorter version of the "a" in "father" in both cases.

These errors - stress displacement, consonant substitution, and vowel shift - correspond to successive levels of learning, and so to possible settings of the program's correction function. (There is yet another level, of the precise values of consonants, as in the initial "b/v" of "Valencia," but it would be unreasonable to tackle this in a speaking dictionary.)

The dictionary would have to deal with inflections, either by long word-lists or simple parsing. But since available online dictionaries already handle inflections, the problem is manageable.

James Wimberley has recently retired from a long career at the bilingual Council of Europe, where he dealt with a range of educational issues, technical assistance to new member states in central and Eastern Europe, and the Council's contribution to the World Summit on the Information society. He and his wife live in Malaga, Spain.

Speech Recognition and Language Learning: A User's Plea

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

VoicePing Releases VoicePing 3.0

Voiskey Officially Launches

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Canary Speech Partners with NeuroLexIQ

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Voicelyt Launches Voice Score

DXC Partners with ElevenLabs

Nabla Launches Dictation for Mac

Fish Audio Raises $52 Million in Seed Funding

Deliverect Partners with SoundHound AI

OrcaRouter Launches OrcaDub