August 30, 1998
By Richard Barchard, Bill Scholz President - NewSpeech LLC
Features

How Speech Can be Used for Multi Language IVR Applications

International businesses are moving forward with telephony based spoken language applications. Of course, they also want one application to serve the language demographics of all their customers. That is easy to do with touch tone systems—the tone of a caller pressing a one in the United States sounds pretty much the same as in Italy, France, and Mexico.

But does that mean the touch tone is the only option for a company that does business with customers who speak many different languages? Hardly. Being able to speak naturally is a much easier way to communicate than drilling through a series of touch tone options on a telephone keypad.

What factors and limitations are involved with multi-lingual spoken language applications? Here are five important factors that need to be considered.

Speech Recognizer Selection

Speech recognizer vendors vary widely in their support for foreign languages. Among those who support a diverse collection of European and Asian languages are Philips and Lernout & Hauspie (L&H). Philips SpeechPearl supports English, French, German, and Dutch (others on request), and L&H ASR supports English (American and British), Spanish, French, German, Italian, Dutch, Korean and Japanese. Other recognizer vendors support smaller lists of foreign languages.

Speech recognizers may differ significantly in their ability to perform well with all the languages they claim to support. In view of this, it may become necessary to use one vendor for one language and a different vendor for another. For example, it may be appropriate for an application to use L&H for Italian and Philips’ SpeechPearl for German. Support of such a multi-vendor environment can prove to be extremely cumbersome unless tools are used which facilitate cross-vendor development.

Grammar Development

Grammar development for a foreign language must be done by individuals sufficiently familiar with that language to ensure they can anticipate the full spectrum of paraphrases the application is likely to encounter.

If an application has already been developed in one language then is translated to an additional one, the tokens or tags assigned during grammar development for the first language can be used with each additional one. This permits the interactive voice response (IVR) dialog call-flow processor to remain essentially unchanged as the application is migrated from language to language.

Call Flow Design

This is a very important issue to consider. Does the IVR platform support a recognizer or recognizers that support languages needed for your application? Many IVR platforms have very limited language choices.

Overall, the logic of a speech enabled IVR application need not change when an application is ported to several languages, unless the nature of the application itself changes when presented to the foreign community. For example, a financial application might require minor modification to accommodate different ways to identify currencies or other numeric values.

If an application is planned for deployment in a multi-lingual community such as Western Europe, a natural extension to the call flow would be the added capability to permit the caller to select the desired language. Thus the prompt might be “for English, please say ‘English’; f¸r Deutsch, bitte sagen sie ‘Deutsch’; pour franÁais dites ‘franÁais’ s’il vous plaÓt.”

The application would then process recognizer output to determine the desired language, then switch in the appropriate recognizer and speech interpreter. The recognizer used to process the response to the language choice question could be any recognizer for which the non-native words were modeled.

In other words, an English recognizer could be “taught” to understand the words ‘Deutsch’ and ‘franÁais’ by modeling the pronunciation of these words just as you would model an uncommon technical term or name.

Recognizer Output Interpretation

“Recognizer output interpretation” refers to the process by which the large number of outputs produced by a recognizer, on any turn of a dialog, are sorted and categorized in order to identify the user’s intended meaning.

If semantic categorization is performed by associating tags or tokens with the various pathways through the recognizer grammar, then the interpretation process remains unchanged across various languages, except for the processing of ‘variables.’

A ‘variable’ is a frequently used word or phrase for which parsing and analysis can be readily predefined. Variables include inputs such as times, dates, numbers, and strings.

A variable processor converts a phrase such as “June twentieth ninety-eight” into “06/20/98,” or “four thousand and six” into “4,006.” Clearly, a variable processor is language-dependent and therefore its vocabulary and parsing capability must be ported to each new target language.

Prompt Development

Speech enabled IVR applications will typically include a sizeable number of pre-recorded audio prompts which will be knitted together by the application to produce the output the customers hear. Prompts are usually “.vox” or “.wav” files that are named by the application designer. Multi-lingual capability can be facilitated by using the same file names regardless of what language the application is supporting, then collecting files for a language in an appropriately named directory. Thus, ...\english\two.wav might contain a recording of “two,” while ...\german\two.wav might contain “zwei.”

The parameters and choices scale up rapidly with these applications. They become very complex very quickly. International businesses need tools to neutralize these complexities and integrate them into IVR platforms and products such as MediaSoft Telecom, Parity Software, Periphonics, Pronexus, International Public Access Technologies, Artisoft and others.

The tools give a designer a way to validate and simulate an application before coding or investing in an IVR platform or choosing a speech recognizer. For the developer, the tools provide one common interface to develop grammars for any context-free grammar based recognizer. This frees them from learning the different grammar syntax when using different recognizers, and the freedom to choose the best recognizer for each country. As a by product of this process, all of the speech interpretation code required for the runtime application is automatically produced.

So what does this mean? Our clients are reporting significant reductions in development time (up to two-thirds) by using the NL Speech Assistant tool suite, and they have applications that are isolated from the speech recognizer. As a result, international businesses are now able to break the telephony “language barrier.”

Rick Barchard is the Director, Unisys NL Group and can be reached at 610-648-7065 or richard.barchard@unisys.com and Bill Scholz, Ph.D. is the Director, Unisys NL Software Architecture and can be reached at 610-648-3824 or bill.scholz@unisys.com.

Companies and Suppliers Mentioned

How Speech Can be Used for Multi Language IVR Applications

DentScribe Launches DentScribe Perio Charting 3.0

Krisp Launches Voice Translation v3

Treble Technologies and Hugging Face Benchmark ASR Models

Why Better Client Tracking Starts With Better Capture of Spoken Clinical Interactions