Tuning Is a Multitiered Process
As the speech market continues to grow, speech applications are becoming more mainstream. While there are definite benefits to the increased penetration of speech applications, such as increased customer satisfaction and reduced operating costs, these applications are more complex than traditional touch-tone applications and, therefore, require additional care and feeding. One aspect of speech application care and feeding is the tuning process. Tuning can be defined as "the analysis of caller and system data to achieve maximum performance and ensure user satisfaction."
Tuning is one of the most vital keys to the success of speech applications; however, it is also one of the most neglected aspects in the care and feeding process. That said, the intent of this article is not to give an exhaustive description of tuning, but rather to outline a number of key activities involved in the tuning process and to highlight some best practices.
Grammar Tuning This process verifies that various grammars contained within an application provide accurate and comprehensive coverage of caller utterances. During this phase of tuning, you need to compare transcribed caller utterances against recognition strings generated by the speech engine for every grammar in the application. The data is analyzed to identify valid caller utterances that were not included in the initial grammar development. Once identified, unrecognized utterances need to be reviewed by subject matter experts to determine whether they are valid candidates for additions to the grammars. After this, you should review the grammar data to identify other words and phrases that can be removed.
Dictionary Tuning This aspect of the tuning process increases recognition accuracy through the creation of custom pronunciation dictionaries. As part of the dictionary tuning process, grammar data is reviewed to identify valid caller utterances that were not correctly recognized by the speech engine. As these utterances are identified, the recordings need to be placed into a queue for further analysis by a linguist. After the utterances are reviewed, custom dictionaries can then be created or augmented to cover the new defined alternate pronunciations.
Recognition Parameter Tuning The idea behind this activity is to optimize speech engine performance through the manipulation of platform- and application-level parameters. Once application grammars and dictionaries are accurately adjusted, you should turn your attention back to the grammar data to identify opportunities to optimize the recognition engine parameters. Some items to look for during this analysis are truncated utterances and long recognition latencies. A truncated utterance, especially during the capture of digit strings, usually indicates that callers are pausing longer than anticipated. These issues can normally be rectified by adjusting end-pointing parameters. Latency issues can be addressed in various ways, including adjusting speed/accuracy parameters to ensure the highest recognition accuracy without sacrificing system response time. It is important to note that recognition parameter-tuning adjustments are tested via controlled experiments on dedicated tuning servers. The results of these tests should be documented accordingly.
Dialogue Tuning This aspect of tuning validates whether prompts elicit the types of responses that were anticipated in the design process. During this process, review grammar data to identify consistent responses that fall outside what was expected for individual states within the application. If areas of concern are identified, scripting and recorded prompts associated with the impacted grammars need to be reviewed. The script review should ensure that the states in question are written clearly and concisely. Likewise, the prompt review needs to ensure that the recordings in question have the correct prosody and intonation. This is often referred to as the art of speech system design.
Each of the processes outlined requires a very good understanding of enterprise business/customer goals, grammar development, and speech engine parameters. This process is time-consuming and can be expensive; however, the rewards that are recognized through tuning are invaluable to the overall success of a speech application.
Aaron Fisher is director of speech services at West Interactive, overseeing the design, development, and implementation of speech applications for the company. He can be reached at firstname.lastname@example.org.