Speech Technology Magazine

 

The Global Transition

Many problems can be solved with a little precoding analysis.
By Sue Ellen Reager - Posted Jan 5, 2011
Page1 of 1
Bookmark and Share

The ideal internationalized application can be global and shift from one grammar set to another and one code branch to another as new languages are added. However, ideal is rarely reality. More often, an existing application is a hit—albeit, a small hit—with a global company, after which the developer is asked to take the application global.

Linguistically, some developers’ approaches could save time and pain when first going global. Most applications are designed with a single language in mind, and thus a mountain of code rework often appears necessary to correctly call a second language: new grammars, dates, times, number sets, TTS, voice prompts, video, and text.

Banking applications are perhaps the most difficult to translate: They were a bear to create in the first language and can become a nightmare when translating into Asian, Middle Eastern, and Slavic languages because their sentence orders have almost no relation to the application’s original language order. Dozens, if not hundreds, of sentences and phrases fall apart on recognition and playback because foreign languages place sentence parts in different locations.

For companies that lack experience in speech application internationalization, some stress could be mitigated by internationalizing application code using English content before tackling the foreign. Also, significant organization and problem-solving can occur prior to coding through linguistic analysis of the documentation and identification of potential issues that could require code changes for other languages. The majority of issues tend to fall into the following categories:

• names of people and places as subject;
• dates/times and the little words;
• numbers as a set; and
• currency amounts as decimal positions.

Names. With names, programmers breathe a sigh of relief, assuming <fullpathaudio> can use pre-existing code. Upon translation, however, a spoken name could easily produce a wrong version. If the application code inserts a spoken name, verify the name sentence position, and if the name is not the subject of the sentence, mark these areas as “might be affected.” Asian and Slavic languages must reword phrases such as “transferring to…<name>” or “received from…<name>.” Slavic must be reworded so the name becomes the subject, not the object, of the sentence, and Asian languages speak the name first: “<name> to transferring.”

Dates/Times. Rework could be spared by using only one date pattern: select either with or without weekday and  year. Mark little words that appear before dates and times, such as at, by, is, will be, before, to, from, and between. Some languages use different date/time formats for each of these words. For some it is recommended to reword the application text to use only one or two before variables, and to inform the translators of the constraints of the system prompts available for dates/times.

Numbers. Most developers are familiar with Spanish un, una, and uno. But Japanese has 12 basic number sets, Chinese has 15, and Russian has 25. Also, many languages have more than one plural, so not only messages but also messagi and messagu are used. Which plural to play will be based on the last number of the digits, and thus 23 plays the plural associated with the number 3. Preparing and testing the three English number sets (one, first, single) and additional new plurals might save rework later in the cycle.

Currency. Higher numbers require close inspection in a system’s code. Each decimal point needs an equivalent, rather than the current approach by groupings where 10,000 = <ten><thousand> and 100,000 = <one hundred><thousand>. For example:
• English: <Your account balance is> <one hundred><eleven thousand> <eighty> <dollars> <and one> <cent>.
• Chinese: <eleven mon> <one thousand> <zero> <eighty> <dollar><zero><one> <cent> <your account balance is>.

Developers build great systems on reality. But language can be frustrating because it is built on a different reality—different from the assumptions passed to developers through books, college courses, and best practices. A complete linguistic analysis of an application’s textual structure can clarify the new reality before inserting foreign text or grammars. Testing code first using a variation on English data can save developers significant time coding to foreign languages.


Sue Ellen Reager is CEO of @International Services, a translation and software solutions company that performs translation, voice recording, and global system testing for speech and touch-tone applications as well as media localization.
She can be reached at
sueellen@internationalservices.com.



Page1 of 1