April 30, 1997
By Mark Nickson
Features

The ABCs for IVR

There are three primary types of voice recognition: number recognition; which involves the numbers 0-9, and the words “yes”, “no” and “oh”; alphabet recognition, for the letters A to Z and word recognition, in which application specific words such as “operator,” “service” or “checking,” are identified.

Number recognition is the most common and easiest to implement. Numeric word recognition uses a standard vocabulary file. There is no application training of a numeric recognition engine.

Alpha recognition enables callers to speak the letters A through Z. It also uses a standard vocabulary file, which does not require training.

Word recognition allows callers to speak words that are specific to the application. It requires training the recognition engine for each word it needs to process.

This article will concentrate on alphabet recognition and give an example highlighting the implementation of alphabet recognition in an interactive voice response (IVR) application.

Although alpha recognition does not require the vocabulary to be trained, it does require the use of application specific “lookup lists,” and is only practical when “intelligence” is built into the IVR application’s overall design.

Alpha recognition will not work in a free form manner. This means that it is not possible for a caller to spell any word and have the recognizor accurately translate it to its equivalent. This is because many alphabetical letters sound similar, such as C, D, E, and G. The comparison is done on the letter by letter basis.

For example, if a system needs to capture the first and last name of a person, it can ask the caller to spell the first and last name. The first and last name is then compared with each name in the lookup list and the name which best matches is presented back to the caller for confirmation.

Achieving Alpha Recognition

To achieve alpha recognition, the following steps are necessary.

First, request the caller to speak each letter of the word (i.e., spell the person’s name.)

Next, prune the database of eligible words to only those words that have the same number of letters in the word, (i.e. the same number of letters in the first name, and the same number of letters in the last name.)

Third, the actual comparison is performed. The first eligible word in the pruned version of the lookup list is accessed and each letter is compared (between the spoken word and the lookup list word.) For each letter that matches, a point is earned, for each letter that does not match, no point is earned. After all eligible words are compared in this fashion, they are ranked from the highest score to the lowest score.

Next, the word with the highest score is presented back to the caller for confirmation. “You said ‘Bill Clinton, is this correct?” If it is not correct, the word with the next highest score is presented to the caller for confirmation.

Intelligent application architecture and smart data collection design help alpha recognition a great deal. In some cases, this pruning algorithm can narrow down the list to only one choice making alpha recognition unnecessary.

To illustrate intelligent application design and the alpha recognition pruning algorithm, a system developed last year for a state Department of Public Health provides a good example.

Callers are first requested to speak the patient’s date of birth, gender, birth order and mother’s date of birth. Each of the four data items are presented back to the caller for confirmation. This allows the database of all patients to be narrowed down. In some cases, the alpha recognition process is circumvented because there is only one record that matches the numeric date, gender and birth order data. In other cases, where there are multiple patient records in the look up list, the small record set narrows the field of available choices in the comparison process such that alpha recognition’s success rate is extremely high.

Mark Nickson, of DAC Systems, 60 Todd Rd., Ste. 113, Shelton, CT 06484, can be reached at 203 924-7000.

The ABCs for IVR

DentScribe Launches DentScribe Perio Charting 3.0

Krisp Launches Voice Translation v3

Treble Technologies and Hugging Face Benchmark ASR Models

Why Better Client Tracking Starts With Better Capture of Spoken Clinical Interactions