Is Your VUI Out of Tune?
Tuning an application can have a number of meanings. Usually it carries the speech scientist connotation of adjusting parameters in the speech recognition engine to improve speech recognition accuracy. But tuning an app can also refer to activities that do not involve getting under the hood of an automated speech recognition (ASR) engine. These might include testing prompts for functional effectiveness, trimming prompts of unnecessary content, determining the rate of false positives, adjusting timers, altering barge-in settings, and so forth.
How do you know if your voice user interface (VUI) is out of tune? Most ASR engines and interactive voice response (IVR) platforms now support some sort of event logging capability. A designer can often create a valuable data set simply by turning this feature on. Some mining of the data can be performed with automation but uncovering and tuning other problems still requires manual review. For example, suppose you want to determine the rate of false positives (also known as false acceptances) that occur in a particular state in your IVR. Logging what the system says, what the user says, and what the system then subsequently does cannot tell the whole story.
Consider a case wherein the system asks, Which account information would you like: checking, savings, or credit card? to which the user responds. The system hears his utterance and proceeds to the next state announcing, OK, checking account information! This would, of course, be wonderful if the user had actually said checking, but when we manually listen to the user's logged utterance, we hear that what he really said was, savings. Finding such false positives can be a challenge, but using statistical sampling techniques makes the job considerably easier. For instance, taking a random sample of human-computer interactions in the state under scrutiny will usually tell the story.
Tuning out or lowering a false positive rate could be as simple as altering the prompt that precedes the false positive or adjusting the state's grammar. In the contrived example above, the words checking and savings may not be phonetically distinct enough. Adding a distinguishing component to both prompt and grammar could help. For instance, let's say the bank's checking product was marketed as Freedom Checking. Adding the word freedom to both prompt and grammar would reduce the rate of false positives if, in fact, users tended to mimic the prompt content. Alternatively, adjusting the state's confidence level may help solve the problem. Of course, the only way to know for sure would be to implement some sort of change in the prompt, the grammar, or both, and monitor its effects.
Testing prompts for functional effectiveness is a fundamental tuning activity. It involves determining whether a prompt actually does what its creator intended it to. Prompts are, of course, crafted for a specific behavioral effect. In our labored example, the VUI designer needs the user to say either checking, savings, or credit card to proceed. But for the sake of argument, let's say that two users call to get information about their debit cards. They both know that their debit card is used for making drafts on their checking account, but that the draft is mediated through a credit card-like electronic transaction. They both call with the words debit card on their minds. The question is how will they map those words onto the available choices?
Consider finding the following responses to the checking, savings, or credit card prompt in a data log:
User: (after a silence) Checking?
User: (after a silence) Credit card?
User: (after a silence) Debit card?
The problem here is that the prompt (Which account information would you like: checking, savings, or credit card?) affords no alternative for people who cannot relate the purpose of their call, at least in terms of the way that they conceptualize it, to the available alternatives.
An effective solution could be as simple as: Which account information would you like: checking, savings, credit card, or none of the above?
Walter Rolandi is founder and owner of The Voice User Interface Co. in Columbia, S.C. He provides consultative services in the design, development, and evaluation of telephony-based voice user interfaces and evaluates ASR, TTS, and conversational dialogue technologies. He can be reached at firstname.lastname@example.org.