7 Ways to Avoid Code Rework for Language Translation

All over the world, people with far too much free time on their hands made strange decisions that produce the illogical results we call “language.” Someone somewhere decided that a table is a female, a rug is a male, and a pen is neuter. What possessed our English forefathers to decide that we must have two versions of “quick” (“quick” and “quickly”), but only one version of “fast”? Or to create plurals by adding “s” to the end, except for “sheep” and “deer”? Who in Asia and Germany made the decision to wait through an entire sentence to find out who did what at the end, not the beginning? And what were they drinking in France when they decided that “eau” is pronounced “o”?

Contrary to language, code is logical, organized. There are no singulars and plurals, no past participles or imperfect tenses, no metaphors or imagery. We cannot coach a machine on irony or sarcasm, or tell a computer to do X on an Android device and then expect the code to use its brains to do Y on an iPhone.

After all the effort the development team has gone through to create a system that works, going back to translate it causes potentially massive code rework, and the developers must devote often hundreds of hours on stuff that they cannot read or validate.

When it comes to interfaces, voice prompts, IVR, text concatenation, etc., there are certain steps that can be taken during initial conceptualization and coding that will lower costs and reduce later rework for localization. Here are seven of them.

Pull interface text from databases, not hard code.

From the early stages, pull interface titles, field content, error messages, checkbox/radio button content, etc., from databases, not hard code. The process of going back after the fact to pull out bits and pieces of text scattered all over an application to properly organize for translation can be a long, miserable task. The database is not necessarily the final storage place for translation, but it keeps the text data organized in one place from the beginning of the project. Once the text data is well organized, translation memory software like Trados and MemoQ can be used to keep track of translations and versioning.

Beware parsing pitfalls.

Many applications parse text based upon punctuation. Each character has a number identifier, so code identifies the end of the sentence when a character that has that “char number” appears. Many languages use different characters to denote the end of a sentence:

And in the Thai language, a space is the end-of-sentence indicator.

Commas also differ:

And many languages do not use commas at all.

Add “ghost prompts” to concatenated text and voice prompts with variables.

Many assumptions that coders make (and are taught in books) result in incorrect translation.

Example: <The hotel guest will arrive on> <date>.

When the variable content is inserted, the results will not be correctly translated in about 40 percent of the world’s languages. Japanese, for example, should be ordered as:

Japanese: <Date> <on arrive hotel guest>.

Not only does the date appear at the beginning of the sentence in many major languages, but even the word “on” changes, with “on Monday” different than “on Tuesday.” And in Germanic languages, the “will” appears near the beginning, but the “arrive” appears after the date.

To partly get around these issues, coders can insert a “ghost prompt”:

Ghost prompt: <text> <date> <text>

These placeholders enable the translated text to at least appear in the correct order.

Another option that may not be practical for all systems is to send the entire sentence in real time via an API to an automatic translation service.

Plan on 20 percent space inflation.

Prepare all interfaces for auto-expansion according to runtime content length. Expect any text that is in the form of sentences or paragraphs, button titles, or title text to become a minimum of 20 percent longer than the original language. You should plan on button text, in particular, to automatically expand five times or more. Leave far more white space on interfaces and web pages than you would ever normally do.

Expect font size changes.

Languages such as Chinese, Japanese, and Korean have many more strokes to each word and must of necessity be two to four pixels larger than languages with ABC alphabet variations. And many Arabic fonts, even though set to what would normally be considered a readable size, appear on the screen almost 50 percent smaller than the pixel size denotes. Keep a table related to fonts and font sizes in order to control these variations by language font.

Include support for persons with special needs

There are, of course, a number of things required to prepare for special needs, and the laws globally are becoming stricter. The first weakness of nearly all systems is a difficult login interface that can prove impossible for visually impaired persons to use even with a screen reader. And lack of availability of text for those with hearing difficulties can be another issue. Consulting with specialists in special needs during the design phase will speed compliance and avoid much retrofit coding.

Take care in selecting speech tech API vendors, and bring in a localization expert.

Selecting a vendor for API calls, whether it’s for automatic speech recognition (ASR), machine translation, or text-to-speech, is an important decision. If left to their own devices, technologists will take the easiest vendor to insert into a project, be delighted when data arrives, and move on to something else. They fail to realize that when some audio is sent or uploaded to certain ASR vendors, half the return text may be missing. Or that a translation service returns wildly wrong results for anything that contains metaphors or imagery. Technologists most likely will not be able to judge accuracy, so they make judgments based upon speed and ease of implementation.

When executives ask a development team to translate their system, perceiving localization as something that can be tasked to developers, they should consider consulting with a tech localization specialist during the early stages. The time to help the developers is at the start of the project, whenever possible.

Sue Reager specializes in across-language speech communication, applications, and context engines. Her innovations are licensed by Cisco Systems, Intel, and telecoms worldwide.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

7 Ways to Avoid Code Rework for Language Translation

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Conversational AI to Reach $41.39 Billion by 2030

Deepgram Launches Voice Agent API