It’s Not the Recognizer
Examining speech application errors reveals several more likely causes.
Posted Sep 9, 2010 Print Version           Page 1of 1
  

It is an undeniable fact that speech-enabled applications can offer a wide array of benefits to an enterprise’s telephony environment. Reducing the cost of telephony operations, providing a consistent and high-quality caller experience, and enabling today’s mobile work force are just a few of the ways speech recognition has changed modern business communications for the better.

That being said, the reality of the situation is that the success of any speech-enabled call routing solution is directly proportional to its ability to handle callers accurately and consistently. As success rates for servicing callers decline, the aforementioned benefits also begin to drop off, effectively negating the value of the original investment.

Traditionally, blame for these declining success rates has fallen at the feet of a single system component — the speech recognizer. When a caller interacts with a speech-enabled call routing system and receives something other than the desired result, the immediate instinct is to assume that the speech recognizer made an error in interpreting what was spoken. System administrators, when reviewing the subsequent system metrics, often perceive poor application performance as being solely due to poor recognition accuracy.

This perception, however, makes many assumptions concerning the other vital components that make up the speech application as a whole — components that can have the potential to dwarf the impact of raw recognizer accuracy on the application’s performance.

Successful speech-enabled call routing solutions are made up of many separate components that work with each other to produce the desired result — successfully connecting the caller to his desired destination according to a spoken name or phrase. Depending on which vendor’s solution you are using, these components can vary slightly from product to product, but in general there are basically five major components to a successful speech solution:

1). The Speech Recognizer: The speech recognizer acts as the “interpreter” for the speech-enabled call routing system, matching a spoken name or phrase with an entry in the system’s directory. Often disparaged in contemporary society (see any number of comic strip plots pertaining to poor speech recognizer performance for reference), the speech recognizer is the caller-facing component of the call routing solution (along with the dialogues).

2). The Grammar: The application component commonly referred to as the grammar helps the speech solution determine the spoken request to connect the caller accurately. A well-formed grammar defines for the recognizer the expected words (vocabulary), pronunciation of the words, and the grammatical structure of the caller requests. For a call routing solution, the vocabulary includes the names of requested destinations. These are often drawn from a directory listing (see The Directory in the next section).

To address pronunciations of the directory items, today’s speech solutions generally utilize “dictionaries” of common terms and names. No dictionary, however, can offer 100% of the entries for a speech solution, leaving some percentage of names with questionable pronunciations. Consequently, attention must be given to pronunciations in order for the recognizer to correctly listen and make a match with what the caller requested. The pronunciation is also important when confirming what the caller requested. If a name is correctly recognized, but is mispronounced during confirmation, it may be rejected by the caller as not being an accurate match.

When addressing typical caller behavior, the grammatical structure is also an important consideration. Programming the grammar to recognize challenges related to speech recognition, such as vocal pauses and common non-sequitur phrases: like, please, umm, and so forth) improves solution performance.

3). The Directory: The importance of developing and maintaining a highly-accurate directory cannot be overstated. Focused efforts must be taken to minimize any gaps in the system between what a caller might request and what destinations are actually available in the directory. Identifying and including all possible caller requests through the system (employees, departments, contractors, vendors, etc.) helps mitigate this gap, increasing the likelihood that a caller will successfully reach his destination and subsequently use the speech solution for future calls.

In addition, constant and ongoing attention to the directory’s content must be maintained at all times to combat the effects of churn within any large enterprise. Churn is a term used to refer to dynamic changes within an enterprise call directory due to employees joining the organization or leaving, employee name changes (married versus maiden names, etc.), office changes, local area code exchange changes, physical site relocations, etc. It is estimated that the amount of churn within the typical enterprise’s call directory can average as much as 40 percent annually. Without a stringent protocol for addressing churn, performance and usage of the speech solution will quickly begin to wane.

4). The Routing Table: The routing table is key to the successful transfer of a phone call. Even if the dialogue is properly set-up, the directory has been scrubbed, and the destination is identified by the speech recognizer, the call will not be routed to the desired destination if the routing table has been set up incorrectly.

Not only does the routing table tell the call routing system which numbers to dial (and in what sequence) when a certain name or department is spoken, but it also ensures that the transfer process conforms to the protocol required to correctly use the dialing pattern.

The routing table is also used to direct the system to the least-cost-routing dialing pattern for each destination, providing considerable cost savings to an enterprise’s telecom infrastructure during the year. In addition, the routing table allows different PBXs (or PBXs that don’t have the same software release) which may each require different call transfer protocols to share the same directory content and coexist within the call routing solution.

5). The Dialogues: The correct dialogue is important to the success of the caller experience as it directs the caller on how to successfully interact with the system. The dialogue needs to give specific instructions of what the caller should request (first and last name, name of the department, state and ZIP code, etc.) to ensure that his request matches the way that the location is notated in the directory. Without these instructions, the caller will be more likely to say the wrong thing and the request will not match a destination in the directory.

The dialogue has to be specific and give instructions WITHOUT being too lengthy, as the caller will become frustrated and simply just zero out to the operator or hang up. This will affect the connection rate and callers will subsequently not embrace the speech-enabled call routing solution for future calls.

Simply blaming the speech recognizer for poor system performance makes many assumptions concerning the other components that make up the solution as a whole, assumptions such as:

  • The directory is 100 percent accurate, containing every possible destination within the enterprise at any given time;
  • The routing table is correctly programmed to successfully connect every caller to every destination; and
  • The dialogues are brief enough to encourage system use, yet are detailed enough to provide the level of information each caller requires for a successful connection;

Of course, these assumptions are unrealistic in real-world applications. It is nearly impossible to predict and address the various influences that can cause speech-enabled call routing solution errors before they can negatively impact performance. It is possible, however, to understand where these errors may originate, and to actively monitor the system to correct issues and thus mitigate their impact on overall system performance.

While the speech recognizer has unfairly taken the brunt of the blame for system errors, it is actually not the most error-prone component of typical speech-enabled call routing solutions. During the past 14 years (and with hundreds of millions of calls connected), Parlance has found that for the majority of installed customers, the source of application errors can be ranked as follows:

  • The directory;
  • The grammar (structure & pronunciations);
  • The speech recognizer;
  • The dialogues; and
  • The routing table.

So now that we have taken a closer look at the potential error sources involving the other components of the speech-enabled call routing solution, you may be starting to see the bigger picture. There is no single component that is responsible for poor system performance. Rather, it is the lack of continuous maintenance of the system as a whole that can lead to performance degradation. Dialogues, directories, routing tables, grammars, and speech recognizers—these components exist in a dynamically changing enterprise environment, and without ongoing maintenance and fine-tuning, the accuracy (and thus performance) of the system will slowly erode over time.


Joseph Maxwell is chief operating officer of Parlance Corp.

Learn more about the companies mentioned in this article in the Speech Technology Buyer's Guide:
{0}
Print Version       Page 1of 1



MarketPlace - Sponsored Links
ITIResearch.com
A collection of market research and reports for executive management and business & IT professionals