October 1, 2007
By James A. Larson program co-chair, SpeechTEK 2021
Forward Thinking

Extending VoiceXML's Impact to New Markets

As the non-English-speaking population has grown in the United States, so has interest in reaching this demographic. If you wish to extend your potential customer base, consider translating your VoiceXML applications.

Language translation services use at least three techniques to translate English words and phrases embedded in VoiceXML code into some other language:
•   Translation memory, where software looks up English words and phrases in a database to retrieve the target language equivalent. The software may generate a measure of the translation quality, which is usually the percentage of words in the sentence that match the words and phrases in the database.
•   Machine translation, where software uses language-specific rules to convert English words and phrases into another language.
•   Human translation—a human translator converts the English words and phrases to the target language.

Figure 1 (below) illustrates how to translate VoiceXML code supporting English into a similar VoiceXML code supporting Spanish. Word and phrase extraction software (Step 1) extracts the words and phrases to be translated from the VoiceXML code.
The extraction program extracts three types of text:
•    Grammars—words that can be recognized by the speech recognition system. Only the words in the "leaves" of the grammar tree are extracted. Many of these words are standard words used by many applications.
•   Prompts to be recorded—prerecorded words and phrases replayed to the user. Voice talent actors record the prompts that are replayed to the user by the VoiceXML application. The Speech Synthesis Markup Language (SSML) elements, which are useful for explaining how prompts should be spoken, are maintained as part of the text and used by the voice talent when verbalizing the prompts.
•   Prompts to be synthesized—words used by a speech synthesis engine to produce audio sounds. Only some SSML elements are maintained as part of the text. Human translators use these elements as hints when translating the text. Useful elements include: paragraph (<p>), sentence (<s>), and the meaning of word (<sub>).

Figure 1 (click for full-size image)

The extraction program places the English text into an English word and phrase file formatted to the XLIFF (XML Localization Interchange File Form) specification. XLIFF enables multiple tools to translate English text and stores information that is helpful in supporting the translation process. The translation service employees may then apply translation memory software (Step 2a), machine translation software (Step 2b), and other software to produce Spanish translations, and store data about each piece of Spanish text, such as the name of the tool or human who performed the translation and the estimated quality of the translation.

A human editor (Step 3) reviews the Spanish translations, selects the best of several alternative translations, or retranslates the text. The human editor may also insert new words and phrases into the translation database. Integration software (Step 4) then inserts the Spanish text back into VoiceXML code. This process reduces the human effort to translate between any pair of languages.

When a VoiceXML application is revised, it may not be necessary to retranslate all of the text in the application. By adding an element to VoiceXML that indicates which English text is revised, the extraction program will only extract the revised text to translate.

Experts in the target language should always review the translated application and verify that everything is socially and culturally acceptable. Experts verify that the translated application conveys the appropriate formality, personalization, and style and avoids phrases and idioms that are technically correct yet inappropriate in the target language.

So if you want to extend the reach of your IVR application, consider translating it to the language of your targeted audience. With technology, much of the routine and boring translation tasks can be automated, leaving human translators to concentrate on the difficult-to-translate phrases. ˝

Dr. James A. Larson is an independent consultant and VoiceXML trainer, co-chair of the World Wide Web Consortium’s Voice Browser Working Group, and author of The VXML Guide. He can be reached at jim@larson-tech.com.

Extending VoiceXML's Impact to New Markets

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

VoicePing Releases VoicePing 3.0

Voiskey Officially Launches

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Canary Speech Partners with NeuroLexIQ

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Voicelyt Launches Voice Score

DXC Partners with ElevenLabs

Nabla Launches Dictation for Mac

Fish Audio Raises $52 Million in Seed Funding

Deliverect Partners with SoundHound AI

OrcaRouter Launches OrcaDub