Are ASR Engines Ready for Machine Translation?

Thanks to artificial intelligence and related advancements, automatic speech recognition (ASR) engines have become immensely more powerful and accurate in recent years. But in the case of ASR text headed to machine translation for subtitles, verbal SMS, speech-to-speech, or automatic interpretation, things can get tricky. ASR text results strongly affect the quality of machine translation; minor discrepancies that are not earth-shattering in transcription blow up into major errors when translated. Thus, about 20 percent more controls over ASR text results are required when they are destined for another language versus simply being transcribed.

We recently spent a month analyzing the text results from seven major ASR vendors to assess their preparedness for translation. Which vendor is best for your organization will vary with your subject of conversations; however, certain patterns emerge across the spectrum, enough to define specific key factors that we would recommend for detailed analysis when selecting the most appropriate ASR vendor for your voice translation needs.

Sentence Splits

The first requirement for good machine translation (MT) involves the ASR vendor’s “punctuation parameters.” Whenever possible, the MT engine should receive a full sentence with subject, verb, and direct object in order to provide context. Paragraphs are even better, if what you are doing is not real-time. Shorter sentences are preferable to long rambling prose, and good punctuation is critical to translation accuracy. In fact, some languages, such as Chinese, do not even have complex sentences in their grammar structure; in those cases, automatic translation has trouble properly digesting and translating heavy compound sentences.

When ASRs offer little punctuation, or periodically insert punctuation in the wrong location, sending these partial and incorrect sentences to translation engines results in greater inaccuracy in the new language.

Example transcription:

Sarah is a CSP speaker and an author of six books Sarah. Has appeared on 300-plus TV and radio shows.

Upon translation into some languages, this becomes “…six Sarah books have appeared on television and radio.”

ASR Results’ Cleanup

One of the advantages of controlling “cleanup” of ASR results before sending the text for translation is your ability to make corrections before total destruction. A certain number of customized words may be added to an ASR system; however, if you have a diversified clientele, in-house control might be a good option, as you can run ASR text through a word-swap dictionary before forwarding to the translation engine.

Speaker said “Rocket” (for Rocket Mortgage) returning as:

rock it

rocks it

rocked

racket

Speaker said “Quicken Loans” returning as:

quickie loans

quick bones

cricket loans

quick ones

quack homes

Accented Speech

A major differentiator between ASR engines is their ability to understand spoken accents. With some ASR engines it is necessary to select the accent in order for that accent to be understood—the system will not hunt for greater accuracy on its own. We recommend that you test your ASR system with accented speakers if your client base necessitates it.

Checking Your Vocabulary

Simple, basic conversation is handled well by many ASR engines if words are pronounced clearly, but some engines tend to be less ready for terms in certain fields. So verify that your choice of engine handles your general field.

Example of ASR results:

Women are taking care of everything and doing all the work telling the fails [“tilling the fields”] and looking after the animals, IRAs Ting [“harvesting”] and taking the wheat to the meal [“mill”].

Test your machine translation engine to assure that it, too, handles your field.

Business Names and Proper Names

If the conversation being passed through your ASR contains significant business and proper names, the way in which the ASRs handle these unknown words is vital for translation.

Example: “Lufthansa private jet” returning as

Loved hands A private jet

Left hands of private jet

Turns a private jet

It is clear that a direct translation of the above will be a mess in any other language. In our tests, if an unknown word were used in a paragraph two or three times, some ASRs tended to return the same result, through the same wrong word, each time. This consistency made it easier to run a word-swap dictionary to repair the problem. Other ASRs tended to return a different result for each of the three instances, making dictionaries less reliable.

Newer Vocabulary

Another key ASR differentiator is the ability to handle newer words to the lexicon. These discrepancies in transcription are magnified, sometimes dramatically, during translation.

Example: “web tech”

The firm specializes in web attack.

Multiple Speakers Parameter

Many services offer a parameter for multiple speakers. The idea is for the system to be on alert for a change in voice sound and/or accent. During our recent testing, some engines took five seconds to properly recognize a new voice, and during that time, the words transcribed were damaged or incorrect.

Example immediately after change of speaker:

ASR vendor #1: “Thanks, Reagan. Thanks, Reagan, and that sounds fabulous.”

ASR vendor #2: “Banks Reagan. Thanks. Rygan. And that sounds fabulous.”

“Banks” may sound like “thanks” and be understandable in the original language transcript, but translated into other languages it becomes “banche” or “?????” without context. Thus, this five-second delay might render conversations unintelligible.

Translation Engines Repair Some Errors

On a happy note, surprisingly some of the finest MT engines will repair some of the errors made by ASR engines. Machine translation looks for sense in text, and it sometimes makes excellent guesses and repairs incorrect transcriptions. The frequency of correction will vary by error type, but 20 percent of errors repaired is a common result.

There is now great interest in ASR engines for their ability to be more quickly customized for an exact corporate corpus. The customizable counterparts, MT engines, have been around for quite a while, so we can see where the ASR train is headed. Heavy investment is currently being directed to these engines. There is good reason why ASRs are evolving, and the future of speech translation is exciting.

Sue Reager is president of Translate Your World and designer of the Tywi suite of software for across-language speech communication used in a variety of business applications and telecoms worldwide. Reach her at sreager@translateyourworld.com.

Are ASR Engines Ready for Machine Translation?

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

VoicePing Releases VoicePing 3.0

Voiskey Officially Launches

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Canary Speech Partners with NeuroLexIQ

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Voicelyt Launches Voice Score

DXC Partners with ElevenLabs

Nabla Launches Dictation for Mac

Fish Audio Raises $52 Million in Seed Funding

Deliverect Partners with SoundHound AI

OrcaRouter Launches OrcaDub