The Call Heard 'Round the World'

A report from comScore Network, an Internet rating service, found that 14 of the top 25 Web sites based in the United States were visited more by users outside of the country than in it. Well-recognized brands, such as Google, Microsoft, and Yahoo!, received fewer than 25 percent of their Web visitors from within the U.S. As a result, companies in just about every industry now want to reach potential customers who are not only located down the street but also across the ocean.

The first step in reaching these potential clients is by speaking with them. Traditionally, enterprises viewed English speech user interfaces as the bridge to potential customers, but that outlook is changing. Customers feel most comfortable when they can communicate in their native tongues, and in a growing number of cases, that language is no longer English. This realization is changing the design requirements for user interfaces in speech systems, such as interactive voice response (IVR), text-to-speech synthesis (TTS), and unified messaging solutions.

Rather than providing interfaces only in English, a growing number of corporations are now starting to build multilingual systems. "About 20 percent of our customers now add Spanish capabilities to their systems, and 10 percent support a variety of languages," says Mark Manz, CEO at Worldly Voices, a recording company focused on computer telephony and Web applications.

While the desire to interact with customers in their native vernaculars is rising, the task of building such systems can be quite difficult. The requirement adds significant man hours to the design process and, therefore, a lot of money to a system’s ultimate price tag. "The cost of adding another language to a speech application can increase deployment costs by 60 percent to 80 percent," states Manish Sharma, director of speech solutions at Nortel.

Vendors are working on ways to lower the amount of work needed, and the price, but they are taking on a complex task, one that cannot be solved quickly and easily. "A company has to rewrite the software to accommodate the different grammar and sentence structures used in foreign languages," Manz maintains. A number of factors are pushing system suppliers to make the process of building multilingual systems simpler, but reaching that goal will require more time and effort.

Vendors are taking up the task because the need for multilingual speech systems has never been greater. In addition to the profound impact that the Internet has had in changing typical customer profiles, other factors are pushing companies to deploy multilingual systems.
The need to offer users more than one language option has been always clear in foreign countries, where companies need to make sure that their solutions support English as well as the native language. "Now, even U.S. companies understand that they need to have multilingual systems in order to market their products successfully," says Sue Ellen Reager, CEO at @International Services, which specializes in translation services and software.

A Multilingual Culture

Even within the U.S., interest in multiple language systems has been rising because the number of people who speak languages other than English has been growing dramatically over the last decade or so. According to the U.S. Census Bureau, the U.S. had 33 million people (11.8 percent of the total population) who were natives of other countries at the end of 2002. Of those, roughly 25.5 million were identified as speaking a language other than English at home. In comparison, in 1990, about 20 million people, or 7.9 percent of the population, were born in other countries.

In the United States and Canada, there are roughly 60 million people who speak languages other than English, @International Services’ Reager says. To reach these customers, companies need to communicate with them in a familiar tongue. "We have seen a number of financial institutions ask that their systems support more than one language," notes Edgar Leon, a speech technologist at West Interactive, a call center and IVR solutions supplier.

The need is even evident in local governments. For example, Rich Garrett, principal at consulting firm Reflex Consulting, has been working with state government officials in New Mexico to add Spanish capabilities to their voter registration and immigrant detention center IVR applications. To answer incoming calls, the voter registration department annually hires a number of temporary workers, a costly and inefficient process. The detention center inquiries are often routed to the records department, an area that was not set up to answer callers’ questions. Consequently, complaints often surface when the callers are not able to get their questions answered. In both instances, the local government thinks that installing new multilingual IVR systems will lower costs and improve customer services.

While organizations can often see the potential benefits from these systems, they are not always willing to ante up for them. "Many of our customers ask for multiple language capabilities, but once they see how much it costs, they decide not to install them," Nortel’s Sharma admits. Currently, adding such features means that companies almost double the price of their systems, and that’s too high a burden for many.

Cost Concerns

The cost is significant for a number of reasons. Up until recently, most speech technology suppliers took a tunnel vision approach to system design. "To fully support other languages, a vendor has to incorporate different languages into the core of its system," @International Services’ Reager says. "In most cases, that approach was not taken because the U.S. was home to the leading speech technology innovators and therefore the dominant market."

This outlook produced a ripple effect in the design of speech user interfaces. "U.S. developers would often be happy if they translated 80 percent of their user interfaces correctly into a foreign language," she continues. "Developers would never allow their English system to say: ‘Your reservation is for two day and a nights,’ but they would allow the international versions of their systems to play such messages."

Gradually, such an outlook turned on the vendors. While the designers were not repulsed by mutilating foreign languages, existing and potential customers were often turned off by these interfaces. "Just think about the phrase ‘Press 1 for English, Press 2 for Spanish’," notes Gary Wright, president of Applied Speech Resources, a Dallas consulting firm. "Application developers would never think of putting such commands in Dutch, but you will often find them in English."

Competition has been changing suppliers’ outlooks as well. Traditionally, companies buying speech systems had few choices. Because most of the early speech application vendors were U.S. companies, foreign customers were left with the choice of either purchasing systems featuring bad translations or having no automated voice system at all. That is no longer the case because a number of foreign suppliers have become key speech technology vendors. Foreign companies, such as Acapela Group (Belgium), Alcatel-Lucent (France), Noetica (England), Speech Technology Center (Russia), and Telephonetics (England), have emerged, and they fully understand the need for multilingual systems.
Yet, delivering such solutions remains a challenge for a variety of reasons. Translating information from one language to another is a complex task, sometimes tripped up by a variety of semantical, denotative, and connotative nuances. The translation process can work well with very simple sentences constructed with fundamental vocabularies. The sentence "She is reading a book," for example, is easy to translate into another language because it constitutes a concrete meaning, where each lexical item carries a literal meaning and nothing more. Problems arise with more complex sentences, or even with short phrases, especially when items, such as the word order, can change from one language to the next.

@International Services’ Reager notes that Spanish speakers construct their sentences differently than English speakers. There are even differences in the way street addresses are presented. In English, the order for an address is street number, street name, street type, and street direction, as in 1234 Peachtree St. N.W. In Spanish, the order is different. They use street type, street name, street direction, and then number, so it would read like Street Peachtree N.W. 1234.

Another area affected is verbs. Verb tenses that are clear in English can have a variety of possible expressions in Spanish. Polite commands in Spanish are spoken as infinitives, (so that choose becomes to choose), yet, even here in some cases users may expect a different form of the verb, equivalent to you choose. Spanish, like many other foreign languages, also uses different verb constructions for talking to someone in a formal or a familiar context. The you choose version is an imperative command, a demand, or requirement, which many Spanish speakers consider unfriendly and inappropriate. Under these circumstances, picking the right verb form among a handful of options can become difficult.

Such issues come to the fore in other languages as well. Since the Japanese sentence structure follows a subject-object-verb order and the English sentence structure is subject-verb-object, moving from one to the other can produce sentences that sound awkward and stilted. Also in Japanese, there are no definite articles and nothing equivalent to the word the. Possible alternatives are kono, sono, or ano, but a literal dictionary definition lists these words as these, those, and those over there respectively. So when performing a translation, a voice application might first search for the meaning in the Japanese lexicon, find none, and replace it with one of the three alternatives, say kono. By doing so, the sentence’s semantics change drastically: rather than the fish, the sentence talks about these fish.

Another difference is that the Japanese language does not have the future tense generally, unless it is used for weather forecasts. The Japanese rely purely on context to determine if will eat or eat is to be inferred, a phenomenon that does not occur in English. Gender raises another set of issues. In English, there is a difference between male and female, but that is not the case with Japanese. One last distinction is that Japanese uses different words to count flat objects or cylindrical objects, something not found in English.

Items, such as idioms, also create problems. Given the metaphorical essence contained in the sentence, The old man kicked the bucket becomes quite difficult, if not impossible, to translate into other languages while maintaining its true flavor.

Dealing with Dialects

The complexity of making suitable translations also extends to regional dialects. While someone in New England and someone in Alabama may both speak English, there are connotations that may be familiar to one and unfamiliar to the other. The word tonic, for example, generates an image of soda to the former and of medicine to the latter.

This phenomenon occurs in other languages as well. For instance, there are more than a dozen different dialects of Spanish, and words and phrases can mean different things to different people, depending on their country of origin. "In one area of New Mexico there is a community that speaks 15th Century Spanish, and that creates a challenge when designing a speech user interface," Reflex Consulting’s Garrett says.

In effect, dialects illustrate that language is not a mere collection of words and grammatical rules; but rather the full expression of a culture. They embody the efforts of a community to interpret the world as well as human experiences. As a result, language reflects the complex personality of each community, and the differences from community to community impact word usage and word selection when a caller is trying to convey a desire or a response to an automated prompt.

Further exacerbating the problem, some dialects do not even fall into specific language genres. "There are a lot of individuals living in the U.S. who speak Spanglish," Applied Speech Resources’ Wright says. In this case, a person’s native Spanish takes on words, expressions, and grammar patterns common to English, and vice-versa.
The end result is language can only be interpreted and learned in reference to the specific cultural contexts. Translations that do not take into account these idiosyncrasies will invariably form non-existent relationships between the source and target languages.

Automated translations are often based on strict literal knowledge of the target language, but they may not work accurately as words move from one language to another. Quite frequently, the nuances found in the meaning of the words and sentences can get lost as a result of the translation from one language to another. That is because while a computer can be programmed with standard grammatical rules and structures, it does not possess the proficiency necessary to deal with exceptional cases or phrases. To get close to the meaning illustrated in a native tongue, human interaction (often quite a lot of it) is required, which is one reason that the cost of multilingual systems is so high.

Another problem has been the underlying infrastructure used with voice applications. Traditionally, voice components were designed in autonomous fashion based on closed systems. As a result, building voice applications has been a difficult and expensive process. Each time a company built a speech user interface, it had to be connected to a variety of back-end data stores, a task that entails interfacing with a number of complex, proprietary interfaces. Companies have been building their core engines in English, and that creates problems whenever users want to translate an application into another language. If a company decided to take on that work and build a system that worked with Spanish, the process would have to be repeated to have the system communicate with French customers. In effect, speech user interface development work has proceeded in a linear fashion with little to no reuse of existing code.

Writing to Standards

Some of these problems have been gradually disappearing. In the past, speech user interface development work required a custom integration effort. The Internet has provided many benefits to developers, and foremost among them is making existing components and software more modular, and therefore more interchangeable. The use of open connectivity standards has made it simpler for developers to mix and match code. Many developers no longer work with proprietary interfaces but instead write to open-standard interfaces, such as Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), Java Speech API, Lightweight Directory Access Protocol (LDAP), Media Resource Control Protocol (MRCP), SAPI (Speech Application Programming Interface) and VoiceXML (Voice eXtensible Markup Language).

Another change is vendors are now more aware of the need for multilingual speech systems. "U.S. systems are moving into bilingual territory (English/ Spanish), and Canadian systems are becoming trilingual (English/French/Spanish)," @International Services’ Reager says. "These changes mean that the average system will not be English-only very quickly."

In addition, products are emerging that can help with the speech translation challenges. @International Services has designed the GlobalConcat System Localizer, a suite of utilities and modules that application designers can use to localize their system. GlobalConcat, which is compatible with a variety of speech recognition engines, automatically generates grammar code segments that help translate information into foreign languages.

Companies, such as Ectato International and Systran, have also been working on translation engines. While these products are becoming more common, they have their limitations. "If a company uses a translation engine, it will typically generate a generic response," Garrett explains. "If you want a true translation, the only option is custom development."

While they still are far from perfect, a growing number of speech user interfaces are becoming better able to support multilingual features. This change in design should help make the companies that use them more efficient.

Surveys have shown that marketing and sales efforts are most effective when companies have localized their speech user interfaces. To date, that has been a difficult and expensive process, one most enterprises have been unwilling to undertake. As changes in speech system design take hold, more and more users may soon be greeted by systems that say, "Press 1 for English, and Prensa dos pues español."

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Call Heard 'Round the World'

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API