Smart Customer Service
Digital Experience Conference
Get SpeechTech eWeekly in your inbox -
Sign up for free
Customer Self Service
Customer Interaction Technologies
Development Tools and APIs
Internet of Things (IoT)
Natural Language, Machine/Cognitive Learning
Speaker Identification and Authentication
Speech to Text
Text to Speech
User Interface Design
Workforce Optimization Solutions
Download Current Issue
Speech Technology Case Studies and Market Spotlights
Speech Technology Magazine's Reference Guide
People's Choice Awards
Speech Industry Awards (Archives)
Smart Customer Service
Digital Experience Conference
How To Advertise
August 30, 1998
President - NewSpeech LLC
How Speech Can be Used for Multi Language IVR Applications
International businesses are moving forward with telephony based spoken language applications. Of course, they also want one application to serve the language demographics of all their customers. That is easy to do with touch tone systemsthe tone of a caller pressing a one in the United States sounds pretty much the same as in Italy, France, and Mexico.
But does that mean the touch tone is the only option for a company that does business with customers who speak many different languages? Hardly. Being able to speak naturally is a much easier way to communicate than drilling through a series of touch tone options on a telephone keypad.
What factors and limitations are involved with multi-lingual spoken language applications? Here are five important factors that need to be considered.
Speech Recognizer Selection
Speech recognizer vendors vary widely in their support for foreign languages. Among those who support a diverse collection of European and Asian languages are Philips and Lernout & Hauspie (L&H). Philips SpeechPearl supports English, French, German, and Dutch (others on request), and L&H ASR supports English (American and British), Spanish, French, German, Italian, Dutch, Korean and Japanese. Other recognizer vendors support smaller lists of foreign languages.
Speech recognizers may differ significantly in their ability to perform well with all the languages they claim to support. In view of this, it may become necessary to use one vendor for one language and a different vendor for another. For example, it may be appropriate for an application to use L&H for Italian and Philips SpeechPearl for German. Support of such a multi-vendor environment can prove to be extremely cumbersome unless tools are used which facilitate cross-vendor development.
Grammar development for a foreign language must be done by individuals sufficiently familiar with that language to ensure they can anticipate the full spectrum of paraphrases the application is likely to encounter.
If an application has already been developed in one language then is translated to an additional one, the tokens or tags assigned during grammar development for the first language can be used with each additional one. This permits the interactive voice response (IVR) dialog call-flow processor to remain essentially unchanged as the application is migrated from language to language.
Call Flow Design
This is a very important issue to consider. Does the IVR platform support a recognizer or recognizers that support languages needed for your application? Many IVR platforms have very limited language choices.
Overall, the logic of a speech enabled IVR application need not change when an application is ported to several languages, unless the nature of the application itself changes when presented to the foreign community. For example, a financial application might require minor modification to accommodate different ways to identify currencies or other numeric values.
If an application is planned for deployment in a multi-lingual community such as Western Europe, a natural extension to the call flow would be the added capability to permit the caller to select the desired language. Thus the prompt might be for English, please say English; f¸r Deutsch, bitte sagen sie Deutsch; pour franÁais dites franÁais sil vous plaÓt.
The application would then process recognizer output to determine the desired language, then switch in the appropriate recognizer and speech interpreter. The recognizer used to process the response to the language choice question could be any recognizer for which the non-native words were modeled.
In other words, an English recognizer could be taught to understand the words Deutsch and franÁais by modeling the pronunciation of these words just as you would model an uncommon technical term or name.
Recognizer Output Interpretation
Recognizer output interpretation refers to the process by which the large number of outputs produced by a recognizer, on any turn of a dialog, are sorted and categorized in order to identify the users intended meaning.
If semantic categorization is performed by associating tags or tokens with the various pathways through the recognizer grammar, then the interpretation process remains unchanged across various languages, except for the processing of variables.
A variable is a frequently used word or phrase for which parsing and analysis can be readily predefined. Variables include inputs such as times, dates, numbers, and strings.
A variable processor converts a phrase such as June twentieth ninety-eight into 06/20/98, or four thousand and six into 4,006. Clearly, a variable processor is language-dependent and therefore its vocabulary and parsing capability must be ported to each new target language.
Speech enabled IVR applications will typically include a sizeable number of pre-recorded audio prompts which will be knitted together by the application to produce the output the customers hear. Prompts are usually .vox or .wav files that are named by the application designer. Multi-lingual capability can be facilitated by using the same file names regardless of what language the application is supporting, then collecting files for a language in an appropriately named directory. Thus, ...\english\two.wav might contain a recording of two, while ...\german\two.wav might contain zwei.
The parameters and choices scale up rapidly with these applications. They become very complex very quickly. International businesses need tools to neutralize these complexities and integrate them into IVR platforms and products such as MediaSoft Telecom, Parity Software, Periphonics, Pronexus, International Public Access Technologies, Artisoft and others.
The tools give a designer a way to validate and simulate an application before coding or investing in an IVR platform or choosing a speech recognizer. For the developer, the tools provide one common interface to develop grammars for any context-free grammar based recognizer. This frees them from learning the different grammar syntax when using different recognizers, and the freedom to choose the best recognizer for each country. As a by product of this process, all of the speech interpretation code required for the runtime application is automatically produced.
So what does this mean? Our clients are reporting significant reductions in development time (up to two-thirds) by using the NL Speech Assistant tool suite, and they have applications that are isolated from the speech recognizer. As a result, international businesses are now able to break the telephony language barrier.
Rick Barchard is the Director, Unisys NL Group and can be reached at 610-648-7065 or
and Bill Scholz, Ph.D. is the Director, Unisys NL Software Architecture and can be reached at 610-648-3824 or
for qualified subscribers
comments powered by Disqus.
AI, Standards, and Digital Trust
Coming November 30, 2021
Trends in Conversational AI for 2022 and Beyond
Coming January 18, 2022
Coming June 07, 2022
More Web Events
Auraya Launches EVA 2.1.2 with AWS Certification
Conversational AI Market to Reach $32.62 Billion by 2030
With the Convenience of Voice Technology Comes Great Responsibility
Voice Assistant Applications Market to Be Worth $11.2 Billion by 2026