AppTek Launches Metadata-Informed Neural Machine Translation System

AppTek, a provider of artificial intelligence for automatic speech rRecognition (ASR), neural machine translation (NMT), natural language processing/understanding (NLP/U), and text-to-speech (TTS) technologies, today released a neural machine translation system that incorporates metadata as inputs used to customize the MT output. It also expanded its core machine translation platform to support hundreds of language and dialect pairs.

With AppTek's new metadata-informed NMT platform, companies can now access a single NMT system with multi-domain, multi-genre, multi-dialect content. By feeding additional metadata into the system, they gain more control of the MT output.

Examples of the MT output customizations possible include the following:

  • Style - switch between formal and informal styles, such as a telenovela and a documentary, and get a translation with an appropriate politeness register depending on speaker status and relationships;
  • Length Control for Automatic Dubbing and Subtitling Tasks - generate shorter or longer translations with minimal information loss or distortion for tasks with hard length constraints;
  • Speaker Gender - toggle to the correct speaker gender, which influences inflections for certain parts of speech, especially in morphologically rich languages such as Czech;
  • Domain - adapt to the genre of the text, such as news programs, patents, talk shows, etc. to increase overall accuracy and use of in-domain, relevant translations of ambiguous words at the document level;
  • Extended Context - optionally make the system consider neighboring sentences within a document when translating a particular sentence so that ambiguity of, for example, pronoun translation can be resolved.
  • Glossary - account for official or mandatory translations which the system might otherwise translate differently; and,
  • Language Variety - account for multiple languages and dialects within a single system, as well as handling mixed-language content.

"By incorporating metadata to influence the MT output we are able to inject some world knowledge into our platform," said Evgeny Matusov, AppTek's lead science architect for neural machine translation, in a statement. "This improves the overall quality and adaptability of the system output and can be accomplished within a single multi-purpose system designed to reduce environmental footprint and cost."

AppTek's metadata-informed MT technology is now available for translation from English to selected European languages and their varieties.

"As the demand for content localization continues to skyrocket, enterprises need to continue to innovate and find new ways to further accelerate production workflows," said Kyle Maddock, senior vice president of marketing at AppTek, in a statement. "Our metadata-informed MT system has been specifically designed with translation professionals in mind, providing them with more control over the MT output, which can further speed up the localization process."

In addition to its metadata-informed NMT system, AppTek has also expanded its core MT platform to cover an extensive list of languages and dialects, including Indic and Slavic languages. It now supports Afrikaans, Albanian, Amharic, Arabic (multi-dialect), Armenian, Azerbaijani, Bengali, Belorussian, Bosnian, Bulgarian, Catalan, Chinese (multi-dialect), Croatian, Czech, Danish, Dari, Dutch, English (multi-dialect), Estonian, Farsi, Finnish, French (multi-dialect), Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Kyrgyz, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Marathi, Mongolian, Norwegian, Pashto, Polish, Portuguese (multi-dialect), Punjabi, Romanian, Russian, Serbian, Slovak, Slovenian, Somali, Spanish (multi-dialect), Swedish, Tagalog, Tamil, Telugu, Tigrinya, Thai, Turkish, Turkmen, Ukrainian, Urdu, and Uzbek.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues