Human Barriers in the Use of Voice Services

In the early days of speech recognition two major applications seemed to excite people. The first one was the voice-activated typewriter-just dictate a letter in the microphone and a printed hardcopy will come out. The other one was the voice-activated dialer: just say a name into the phone and the call will be placed. Today, we know these dreams are reality. Let’s examine some of the barriers that will need to be overcome in voice dialing before these services will be widely accepted. The premise of voice dialing is simple: you say a name or another command word, it will be understood by the phone system, and a corresponding phone number will be automatically dialed. While this appears to be simple enough, there are other circumstances that impede its simplicity. The first barrier to overcome is to get user acceptance for current recognition accuracy. Up to now the recognition performance has not been sufficient for success, but today many of the recognizers are good enough to use over a telephone line. Previously, if a recognizer misunderstood you in your first try, you would probably think that you did not pronounce the word correctly and the fault was yours. If the same thing happened again in your second try you would perceive the quality to be low and blame the system. The technology was regarded as immature and less than useful. This is happening less today. However, there are still difficulties in noisy environments. Continuing the development of better algorithms and improving acoustic representation are improving the robustness. The second barrier is the training of the system to create a personal phone book. All your preferred names and numbers from your notebook must be entered to the phone. Before the late 90´s this training was exclusively done by voice. The user had to repeat every name one or more times and after that, allocate the number to be dialed for each name. This procedure took a while and was normally only attempted once by the user. Then the service was operational. If the stored data was lost for any reason very few people were motivated to retrain the system. Many services were abandoned making users even more skeptical of the technology. Today, many phones, especially mobile phones, have built-in voice dialing. But most phone books for voice dialing still have to be trained by the user and are speaker dependent. If the phone is lost or stolen the personal phone book and voice data are gone. Using network-based voice-activated dialing the phone book will always be available for dialing or checking numbers in case you lose your mobile phone. Moreover, using the Internet to enter names and numbers can be done on a personal Web page or even imported from any PC-based version. The typed names are automatically converted to a speaker independent vocabulary name list, thereby overcoming the barrier of system training. Another barrier for voice dialing is privacy. Using the service in public environments or among groups where you don’t want people around you to know who you are calling makes privacy difficult. So, instead of voice dialing you may want to use the keypad. An alternative to using the keypad could be the use nicknames in your phone book. Finally, a potential barrier in using intelligent, voice-activated services may be our unwillingness to change our behavioral patterns. There is always the chance that we will forget to use the services and will continue to dial the number as before, even if we have a perfect voice dialing system available. This may be the real threat to the new alternative of voice dialing: should I, would I, could I change my habits? Dr. Fred J. Lundin, is speech application developer of Telia, Sweden. He can be reached at fred.lundin@telia.com.
