As Speech Translation Advances, Barriers Continue to Fall

Article Featured Image

“If you need perfect quality, there will be some editing involved,” he says.

The more technical or nuanced the information, the more a human will need to be involved in producing the final translation. Human translators likely will not be needed for mundane, simple work, but that is not likely to be the case for more complex materials, according to Matusov.

Another challenge with speech translation is the need to standardize interfaces and data formats to ensure that systems are compatible. Research in this area is being fostered by speech translation consortiums, such as C-STAR and A-STAR, which are working to design standards for bilingual corpora, interfaces, and data formats to connect speech translation modules internationally.

Other ongoing challenges include speaker-dependent variations in speaking styles and pronunciations and external factors, such as acoustic noise or speech by other speakers, in real-world situations. Current systems continue to work best in quiet environments with one person speaking at a time, which is not how most conversations occur.

For the best translations, context is also essential, experts agree. Not only will it help with gender-specific translations, but also with formal and informal translations, Matusov says.

“You need to know how to manipulate really good neural translation software to get the right result,” Reager points out.

Reager also notes that many different types of translation engines exist for different types of uses. Google, she said, is among the best in many cases, and many experts agree.

According to a study by online translations platform One Hour Translation, Google Assistant is one of the top-performing real-time voice translators, ahead of Apple’s Siri, Amazon’s Alexa, and Microsoft’s Cortana and Skype Translate.

One newcomer in the speech translation space, though, threatens Google’s leadership position, many experts content. That company, DeepL, is a German firm that launched its Translator free machine translation service in August 2017. DeepL’s technology converts words between English and 10 other languages (Chinese, Dutch, French, German, Italian, Japanese, Polish, Portuguese, Russian, and Spanish) and proposes approximations of language equivalence among them all using a two-step process via an English pivot. That, many experts believe, could ultimately make DeepL more accurate and nuanced than Google Translate.

Reager says DeepL offers the best translation for Chinese speakers in the United States, but for Chinese speakers in China, Sogou is a better option.

“Each speech translation engine has its own strengths and weaknesses,” Reager advises. “You should use the one that offers you the best return.”

When it comes to speech-based translation, many difficulties have already been overcome, but many others still remain. Most of them are related to the languages themselves. For one, some languages use different tenses and gender pronouns depending on the context, and translations have to take them into account.

The issue of gender misidentification is perhaps the hardest to address, and it goes beyond basic translation issues. To deal with it, researchers at the University of Trento in Italy in June proposed a benchmark—which it dubbed MuST-SHE—to evaluate whether speech translation systems fed textual data are constrained by the fact that sentences sometimes omit gender identity clues. The co-authors assert that these systems often exhibit gender bias, and that signals beyond text (like audio) provide contextual clues that might reduce this bias.

In machine translation, gender bias is at least partially attributable to the differences in how languages refer to males and females. Romance languages, for example, incorporate gender agreement into their grammatical constructs, with verb endings and other sentence constructs dependent on the gender of the nouns involved.

The MuST-SHE standard is a multilingual test set designed to uncover gender bias in speech translation. It comprises roughly 1,000 audio recordings, transcripts, and translations in English-to-French and English-to-Italian pairs from the open source MuST-C corpus, annotated with qualitatively differentiated and balanced gender-related phenomena. MuST-C is the largest multilingual corpus available for the direct speech translation task.

“If, like human beings, ‘machine learning is what it eats,’ the different diet of machine translation and speech translation models can help them develop different skills,” writes the researchers. “By eating audio-text pairs, speech translation has a potential advantage: the possibility to infer speakers’ gender from input audio signals.”

The report comes after Google introduced gender-specific translations in Google Translate chiefly to address gender bias. Scientists have proposed a range of approaches to mitigate and measure it, most recently with a leaderboard challenge and set of metrics dubbed StereoSet.

There are other issues to be further resolved in speech translation, but experts agree that problems with mistranslations are becoming fewer and fewer as the underlying technologies continue to improve. And once that occurs, people will be able to resume international travel without any of the language and cultural barriers. 

Phillip Britt is a freelance writer based in the Chicago area. He can be reached at spenterprises@wowway.com.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Companies and Suppliers Mentioned