Article Featured Image

Kentucky Fried Chicken expanded to China in the late 1980s, and little more than a decade later became one of the most-recognized international brands in the country. The Colonel’s original recipe apparently translated well among Chinese consumers, even though KFC’s initial slogan—which went from Finger lickin’ good in English to Eat your fingers off in Mandarin Chinese—didn’t. Tales of these business mix-ups are not uncommon. Some, like the KFC dilemma, are true, whereas others, such as the rumor that the Mandarin version of Come alive with Pepsi is Pepsi brings your ancestors back from the dead, are harder to verify. Apocrypha notwithstanding, the proliferation of these anecdotes underscores the complexity of translating a brand in a foreign business environment—discerning general meaning is inadequate because the varying connotations and implications of certain words often elude non-native speakers.

That confusion involved only a single textual sentence, so it stands to reason that in an interactive voice response (IVR) system, that complexity is much more layered. By definition, an IVR engages a caller in a guided conversation. Not only do designers have to consider syntax and grammatical structure, but they also have to account for the speed with which a language is commonly spoken and the voice actor’s accent. Still, many enterprises bang out a direct translation from their English IVR to whatever foreign language they require. It’s a quick, cheap, and lousy idea.

José Elizondo, senior manager for multilingual VUI design for Nuance Communications’ North American Professional Services unit, bristles at the word "translation," which, he believes, is a reductive definition of the actual process. There’s more to consider than pulling raw meaning from a sentence.

"I’ve seen some companies deploy really horrific things because the translator was from one specific [Caribbean] country where the [Spanish] verbiage they provided works," Elizondo says, "but it happens to have horrendous sexual connotations in the rest of Latin America."

This is why localization, wherein a speech system is adapted to account for both the linguistic and cultural nuances of a specific region, is so important for enterprises reaching out to non-English-speaking customers. Jenni McKienzie, a VUI designer for online travel agency Travelocity, understands that. She had a relatively smooth time building a system in Mexican Spanish—the dominant form of Spanish in North America. Sabre Holdings, which owns Travelocity, had an employee who was both a native Mexican and had, equally important, a human factors background.

"Then they wanted to start Travelocity Argentina," McKienzie says, "and they asked, ‘We can use the same app, right?’ and I was like, ‘Uh. Not exactly.’" Because of variations between Mexican and Argentinean Spanish, McKienzie couldn’t slap her Mexican script with its Mexican voice talent onto an Argentinean populace—at least not without insulting Travelocity’s prospective customers.

Unfortunately, Travelocity didn’t have a native Argentinean in house, let alone one with the human factors experience necessary for good VUI design. So McKienzie farmed out localization duties to Your Mother Tongue, a three-person, Toronto-based translation house specializing in Spanish languages. She sent her Mexican Spanish script to the house, which localized it using freelancers from Argentina.

McKienzie is savvy enough to know the requirements localization demands, but she’s also aware of the available resources—or lack thereof. Travelocity’s speech team consists of only three people, and because nearly all of the company’s call volume is in English, it has only an English recognizer. Travelocity’s foreign language IVRs are touch-tone.

Even Your Mother Tongue, whose client list includes Random House’s Ballantine Books division, Reuters, and the Consulate General of Mexico, predominantly translates long documents. The company has less experience working with contact center scripts.

When it does get an IVR script, the localization effort is limited by the client’s unique needs. "Spanish tends to be 15 to 20 percent longer than English," Your Mother Tongue president David Gomez explains. Often, a client wants the Spanish recording’s length to correspond with the original English.

For example, the IVR script Your Mother Tongue localized for apparel retailer Men’s Wearhouse included the phrase for any Men’s Wearhouse, which translated to cualquier tienda de Men’s Wearhouse—nearly twice the length. "We had to get rid of the word tienda, which means store, and just assume that people knew what the Men’s Wearhouse was," Gomez recalls.

In this case, the limits of the English-language IVR constrained the Spanish-language IVR. With Travelocity’s system, functionality actually decreased due to technological barriers.

But because downgrading to dual-tone multifrequency systems doesn’t work for every company, there are considerations that enterprises need to make when localizing a fully speechified system. New homophones often crop up and confuse the recognizer. Additionally, the call flow might require resequencing because after localization, what took five prompts to say in English suddenly requires two or 12 in another language. These issues demand a designer who is familiar with the nuances of a foreign culture and its language.

English is an odd, ungainly language, so sprawling that the Oxford English Dictionary could flatten any foreign counterpart. And because its reach is so international, it’s easy to forget that English grammatical structure doesn’t always extend to other languages. Most Indo-European languages—English being a notable exception—have gendered nouns that need to be accounted for when designing and coding an IVR. And while English speakers may or may not use classifiers to quantify a noun, the presence of classifiers in the Chinese language, used in conjunction with numerals, is mandatory.

Consider the specific way a culture formats the presentation of dates. Even countries speaking the same language don’t, in this case, always speak the same language. Americans typically structure dates by month-day-year, while their British counterparts adhere to day-month-year.

Times and numbers are also tricky because each language has its own unique way of stating them. If a flight arrives in Norway at 15:20, or 3:20 p.m., a Norwegian will tell you it arrives 10 minutes to half four. And if you have one hundred and one dollars in English, you’d have one hundred one dollars in German and one hundred zero one dollar in Mandarin Chinese.

All of these factors need to be considered when localizing, except there’s a problem: Changing the words and grammatical structures of languages within an IVR often means changing the coding as well. And for most programmers, the further the language gets from their own native tongues, the greater the exertion.

If a caller tells an English-language speech system, Eh, well, I guess my preference is to travel by plane, the IVR trashes the unnecessary verbiage and vocal pauses to pick out crucial words like plane.

"What they do is try to cover what people might say, and they do a great job in English, absolutely," says Sue Ellen Reager, founder and CEO of global translation services company @International Services. "But what happens in Norwegian is that my and your come after the plane, not before. So your code, even if you translate it, is all in the wrong order."

Coded Words
Performing a thorough localization of an IVR is particularly tricky because of the system’s dual nature. There’s outgoing activity, which is what the system speaks to the customers, and there’s the incoming activity, which requires the system to recognize the caller’s speech. "When you’re talking to VUI designers, that’s where the nightmares happen," Reager says.

In the January issue of Speech Technology, Reager, who speaks 10 languages and knows the grammar of 50 ("Hindi is where I draw the line"), said the first step in achieving "an acceptable level of localization" is to "assume from the get-go that your system will be translated into many languages and have many versions of the original available." This means having a multilingual designer to anticipate potential problems available as early as the requirements-gathering session.

Nuance’s Elizondo speaks five languages with varying degrees of fluency. In 12 years, he has designed more than 50 systems in more languages than he actually knows. "Once you’ve done several multilingual systems," he says, "you know what kind of questions to ask and how to spot a problem, even if it’s not your own language."

Even a system that seems simple in English might evoke a sense of martyrdom for the designer. An English-language IVR that accepts only yes-no utterances is relatively straightforward. Its limited functionality constrains speech, and thus has high recognition rates. Additionally, its questions are direct, which in English is helpful and time-saving. Translating that same IVR out of English, however, can open a Pandora’s box of issues.

In Japan, not only is it proper to frame the prompts and make them less pointed, it’s not acceptable to explicitly acknowledge that you don’t know something. A prompt might instead state: If you have that information, please say continue. Otherwise, say repeat. It’s not just the words that have changed; the underlying logic is also very different.

Complications with yes-no arise even in Western languages. The Finnish tend to respond to those types of questions with a verb. If a Portuguese system asks if someone speaks the language—fala o Portugues?—instead of simply answering in the affirmative, he’ll say fala, which literally means speak.

A multilingual person working with the speech team during development can write specifications to the team not to re-use yes-no grammars from the original English-language IVR. "So regardless of whether you’re the first person touching those grammars, which may be the English designer or speech scientist, you’ll know to create two files instead of one," Elizondo says. "And once you localize them, you can localize just the file that’s pertinent to that context and not go back and say, ‘Hey, I need to change the code so it involves 20 different grammars instead of one.’"

Culture Clash
On July 1, 1997, Britain transferred sovereignty of Cantonese-speaking Hong Kong to China. Soon after, city signs and government announcements shifted to Mandarin, mainland China’s dominant dialect. Cantonese is more tempo-rubato, Mandarin more rhythmically consistent. And the gradual phaseout of Cantonese has interesting implications for the populace. "In Hong Kong, what was once impeccable Cantonese is slipping into the Mandarin word order," Reager says. "They’re becoming very confused over there."

Of course, that affects localization for Chinese-language IVRs, especially those deployed in Hong Kong. It can also affect the way an American IVR might function within a Chinese-American community. While Cantonese is the dominant Chinese dialect spoken in America, an individual elbowing through New York’s Canal Street will also hear Mandarin and even a little Shanghainese. San Francisco’s Chinese community, by contrast, speaks almost exclusively Cantonese.

The infusion of Cantonese into Mandarin in Hong Kong mirrors the emergence of Spanglish in the United States, of which IVR designers must be wary. One of Elizondo’s recent customers did a routing application for a department store. If a native Spanish speaker wanted the appliances department, he could request the departamento de las electrodomésticos. But Puerto Ricans already living in a bilingual environment, or second-generation Latin Americans in the United States, commonly requested the departamento de las appliances.

Additionally, when an individual immigrated also determines the relevance of a speech system. In California, Reager states, a Chinese-language system still refers to the IVR menu as an option choice, which is antiquated mainland Chinese terminology; the Chinese-American populace never updated to the word menu. "If you play that in California, people will say the translator is bad," she says. "What they don’t know is that they’re the ones that are off by 20 years."

Elizondo, who grew up in northern Mexico, is familiar with how a mind set developed in the old country can tinge perceptions in the new. There was a tendency in Mexico not to trust an enterprise with personal information. "Even in my case, coming to the U.S. and feeling comfortable making a transaction with a credit card over the phone was a big deal," he says. "It wasn’t a trivial concession." So a telecommunications company servicing southern Texas that requires a Social Security number to access an account might encounter some reluctance from a large portion of its clientele.

Or the customer might not even have a Social Security number. Elizondo gives the example of a cell phone provider in the southern U.S. with Mexican customers who often visit friends and family in the U.S. "So on one of their trips to the U.S., they buy their cell phone there," he says. "Why should they have a Social Security number? If they don’t have one, and you don’t design the application so there’s the option to do a different type of authentication, you can’t service them."

To reconcile, the IVR might present other options—asking for another form of identification, like a phone number, that the client would be more willing to provide. Elizondo adds that this alternate should be presented gracefully, in a way that doesn’t imply failure on the caller’s part.
"And make sure it’s something you can count," he emphasizes. "Many companies don’t realize how many people are in that situation, and at least they’ll be able to look at the logs and say, ‘Hey, this is 30 percent of my population. Maybe I should do something different.’"

When McKienzie localized Travelocity’s Argentinean application, she found a native voice talent to record the prompts. "The company I usually use for voice talents said, ‘We don’t have anyone who’s Argentinean, but so-and-so can do the accent!’ McKienzie recalls. "I said no. I wanted a native speaker."

Even expatriates can be inadequate; living outside of the native country, an expatriate voice talent will inevitably acquire a slight accent that distinguishes her from the target populace.

Staying Neutral
While regional differences can justify stronger accents—Mexican for an IVR deployed in California, or Caribbean for a system in Florida—they can be problematic for a national company with widespread customers. What sort of Spanish accent should a national institution deploy?

"You have to aim for the middle of the road, which is tough because there’s no such thing as neutral Spanish," McKienzie says. "People ask all the time, ‘Can’t you just use neutral Spanish?’"

Reager both agrees and disagrees that there isn’t neutral Spanish. Ultimately, the neutral version of any accent is the version that offends the fewest people.

"If you tell me you’re going to do a system in Urdu, at the very least I’d research if there were different dialects of Urdu," Elizondo says. "Is there a standard form that is understood by people who speak the different dialects? Are there sociolinguistic considerations or stigmas or connotations that come with a particular dialect when spoken by a voice talent? Which is used by the newscasts in that language?"

Mucho Dinero
The biggest deterrent for enterprises needing to localize is the cost. Depending on the size of the script, just the translation and code can cost $100,000.

"One of the other problems that people have is that code, especially on the incoming, cannot be translated by amateurs," Reager says. Enterprises, she adds, can essentially be locked out of their own systems.

To remedy this, Reager and her team launched an online software service called the System Localizer. Currently applied to products from Cisco Systems, Telecom Italia, and Bell Canada, Reager’s invention separates the language and uses mathematical algorithms to automatically localize strings of code. The ultimate goal is to strip out 90 percent of the costs in translating a system.

But if there are figures that justify localization for American companies targeting Americans, perhaps it should be the Language Spoken at Home dataset published every decade since 1980 by the U.S. Census Bureau. As of 2000, the dominant language was English. However, 10.7 percent of Americans spoke Spanish at home. As of July 2006, Hispanics constituted 15 percent of America’s total population. They accounted for half of the population growth between July 2005 and July 2006. That’s a huge customer base, especially for a national company.

And the international community is becoming increasingly inclusive, such that there really aren’t a lot of merits for an enterprise walled behind a single language. "Thailand and Korea, who are right next door, they’re all starting to intermingle, to use systems in three or four languages," Reager says. In June 2007, the European Union offered free public access to the InterActive Technology for Europe database, containing more than 8.7 million terms in 23 official European languages. Development of the database cost more than $2.2 million, and maintenance costs in 2007 alone were more than $979,000.

"Europe knows how important speech recognition and auto-translation are to their future, and they’re investing in it," Reager says. "This is just the wake-up call for America to go global as quickly as possible."

Tricks of the Trade
Localizing a system requires careful attention to detail, but, if done correctly, raises customer satisfaction and allows enterprises to tap into new markets. Here are some steps Jose Elizondo recommends that led to the successful creation of a second-language speech system designed by Nuance Communications for Bank of America:
 • Pay as much attention to the requirements gathering for the foreign language as for the English.
 • Use every important source of information to make decisions in addition to the knowledge of experts in the field: caller data, marketing data, focus groups, agent feedback, etc.
 • Follow the speech team’s recommendations on how to adapt the design to better serve non-English-speaking customers.
 • Pay a lot of attention to the development of terminology that would be understood by all dialects.
 • Conduct thorough testing by your target demographic and be responsive to all issues that arise during that process.
 • Be responsive to design decisions that are based on target demographic customer behaviors and preferences.
• Conduct a comprehensive usability study (in advance of deployment). Within the localized system’s target demographic, the study should include a significant number of participants from different backgrounds and customer profiles.
 • Maintain ongoing efforts to improve the system based on tuning data of millions of calls.
 • Consider all of the external factors to the speech system (agent training, user guides, literature, etc.), not just the typical IVR messages.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues