Going International with Speech Applications

Developing speech applications for the expanding world market involves not only deploying the significant technological improvements of recent years, but also addressing a series of considerations uniquely encountered in multi-language speech products.

These products need to satisfy not only global objectives or requirements, but also requirements of national and regional nature.

For products with well defined requirements, integrated development is best served with a simultaneous worldwide product release rather than staging sequential releases across the multiple languages. However, for products testing and evaluating new market requirements, it is most expedient and economical to rely on preliminary development for a single country before developing the product worldwide.

Some of the important issues to consider in the development of worldwide speech products include the understanding of any unique country specific requirements, iterative human factors design, continual test and evaluation, speech technology selection, tight development controls, intuitive “out-of-box” experience, localized screen designs and packaging, and development efficiencies.

Define Product Requirements

The first step in targeting a worldwide market is to clearly differentiate those requirements that are generic in nature from those that are country specific. In addition to country specific requirements, there may be unique requirements for specialized or vertical markets. Clearly defining the requirements in relation to generic, country specific, and specialized or vertical markets lays the solid foundation for developing an overall system structure. An example in point exists in differing character sets across languages worldwide. The specific keyboards required to support the specialized character sets need to be planned for during the development and translation process.

To complete a detailed view of the individual country requirements, it is helpful to complete a market segmentation that clearly defines the country and application area mapped to their unique requirements. This segregation of requirements is not necessarily easy or obvious. An obvious approach in one country or specialized vertical market may not be a desirable approach for a generic product. Early identification and testing is crucial in developing a simultaneous world wide product.

Specific Country Requirements

With a clear understanding as to what is generic and what needs to be translated, it is very important to develop the requirements so that language specific strings are stored as appropriate resource files for translation and not embedded in executable code.

Visuals for display of any material should be reviewed early and finalized as soon as possible. This should occur upon completion of final function validation testing and it is usually best, in order to avoid rework, to delay translation until such final function validation testing is complete. The objective when developing speech applications in multiple languages is to translate only once.

Human Factors

Proper human factor engineering involves extensive analysis of the way in which the user interacts with the application. Iterative testing and improvement of designs with target users is necessary to make the product as intuitive as possible.

As part of this interaction, it is very important to study the documentation, the expected application responses (especially if errors occur in recognition) and specific national and regional linguistics. As an example, words and phrases in a recognition vocabulary likely will not translate to the same meaning in all languages. Colloquial expressions as the natural means of communication do not often pass through a translation process well unless the translator is an expert in the field for which the program is being developed. Speech patterns in the different countries will vary and it is important to ensure their proper handling for a country’s localization. For “33”, three users in different countries might say “three-three”, “thirty-three”, or “double-three”.

As human factors are perhaps the most important factor in developing speech applications, this holds doubly true for speech applications that are being developed in multiple languages. The importance of reliable human factors expertise cannot be overemphasized. If these skills and expertise are lacking, get experts and country locals to design and test your speech application.

Speech Technology Issues

Of course, embarking on the development of a multi-lingual speech application requires proper speech technologies. Based on the specific interface and targeted countries, it is possible to quickly create a short list of those technologies that offer the proper function for each of the countries. But, that is only the initial step.

Ideally, specific vocabularies should be localized for each country and even perhaps for each specialized, vertical area. Command words or phrases “translated word-for-word” may yield homophones that give poor recognition results compared to other command words or phrases within a specific language. Phrase structure is often different. Looking for key words within one language for differentiation is not acceptable in achieving high accuracy within another language.

Another aspect, not unique to speech recognition, is the manner in which numbers, dates, and currencies are handled. The format for the date may be Month/Day/Year or Year/Month/Day.

Create an Intuitive Tutorial

As natural as it is to talk, it is a new (and often learned) skill for the user. Therefore, create an interactive, multimedia tutorial to teach the user how to interact with the speech application.

Unfortunately, with the development of a multimedia tutorial come additional development problems. To minimize any rework, it is best to wait until functional validation testing is complete before doing any screen captures. If there is audio material to be synchronized with the screen captures and other visual material, it is important to know that the “time” allocated for descriptive narrative in one language is probably not identical to that of another language. Hence, the pace of the overall tutorial might change considerably depending on the language in question. As a result, what may be an natural experience for a user in German may be unnatural for a user in French.

There are important distinctions and considerations required to make this decision. Issues not discussed, but also important for consideration, include the requirements for marketing, testing, training, and maintenance.

It is true that the process of a simultaneous multi-language release will require more time than that of any individual language release. It is also true the same simultaneous release effort will require less time than the sum of two individual releases (the exception being the translation from US English to UK English). In addition, benefits can accrue by testing the different languages concurrently and clearly creating a set of common code that is truly cross-language supported. The result is a relatively stable code base for management of version, revision, and maintenance control.

Walt Nawrocki is President and CEO of Registry Magic. He was previously an executive with IBM where he was responsible for business management and product development of speech recognition products and offerings worldwide.

Registry Magic provides speech-driven, conversational applications and services that employ their Human User Interface ^TM technologies. Registry Magic products include their Virtual Operator ^TM , a speech driven, conversational office system, as well as their Virtual Employee ^TM and Customized Conversational Solutions ^SM for call center and point of sale environments.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Going International with Speech Applications

Nex-Gen Chat Solutions with Generative AI You Can Trust

Speech Technologies in the Low-Code/No-Code World

Meeting the Rising Demand for Voice-Based Biometric Systems

More Web Events

Tips for Reviewing Voicebot Vulnerability

Safety and Ethical Concerns Loom Large in Voice Cloning

Apple Proposes Acoustic Model Fusion to Improve Speech Recognition

Aculab Launches Audio-to-Audio Translation