[IMGCAP(1)] Editor's note: This is Part 1 of a two-part series in which Peter Fleming explores speech and the Internet. Part 2 appeared in the November/December issue of Speech Technology Magazine. Within the world of speech technology, the greatest profits thus far have been made in the telephony area. And within the telephony area, interesting things are happening with regard to speech recognition.
Speech technology is finding a home as an input device for interfacing with the Internet, opening even more utilization doors for our segment of the electronic revolution. The use of speech recognition command-and-control to operate a computer has met with limited adoption, except for those who need it because of physical challenges or occupational situations requiring hands-free operation. But navigating the Internet by voice control is now becoming popular, finding a particularly visible niche in mobile devices. One of telephony's developments involves screen readers - text-to-speech programs that can be used for reading e-mail in a machine voice. This utilization is becoming available on telephone services where a call can be made and the e-mail played back. In this context, a voice responds directly to a telephone answering machine or as a voice file attached to an e-mail. Speech technology transcribed directly over the telephone from dictation is not being used extensively, however. Although this is possible, it is not yet popular because of reliability issues stemming from the relatively narrow bandwidth of the telephone, cramping frequencies, altering sound quality and greatly lowered and changing fidelity. There are also challenges in trying to correct - by voice without a screen - material that has been dictated and incorrectly recognized over the telephone. Call centers are big news in the telephony area. Businesses report that money is being saved and profits made by replacing telephone operators with speech technology systems. For some years now, large telephone companies have used the recognition of the words "yes" and "no" by computers for the acceptance or rejection of collect telephone calls, third-party billing or in other appropriate venues where simple answers suffice. Similarly, the recognition of digits over the telephone has been quite accurate, and small vocabulary command systems are quite reliable. These systems allow the user to retrieve voice or e-mail messages which are then read or played back, as well as send telephone messages and make calls through commands recognized by speech. An example of that type of system was the work of My Talk (MyTalk.com from GeneralMagic.com), which until recently offered free telephone service in exchange for the user exposure to advertisements. Free accounts were set up by e-mail on the Web site and over half a million users subscribed. Such a system has certain inherent advantages. This company offered free use of voice mail, as well as the ability to send two-minute telephone messages, to other subscribers on the system. It also offered users free two-minute telephone calls within the United States in exchange for listening to one or two short advertisements. Similar services exist, some of them available at a subscription rate by payment instead of free for listening to advertising. Other available companies in that field include Evoice.com, which offers free telephone answering and retrieval, free e-mail, e-mail retrieval by phone, and even phone message retrieval by e-mail. Free long-distance telephone calls are not available in its system except through a link to Dialpad.com. ThinkLink.com offers a free voice-mail service with retrieval by phone or through the Internet. In addition, it offers the options of less expensive long-distance calls at 5 cents per minute from a local access number; people can call the user on an 800 number for 10 cents per minute; or the user can make long-distance calls from an 800 number for 15 cents per minute. Dictation speech recognition has met with limited acceptance for composing e-mail. Although dictation speech recognition is used extensively by doctors, lawyers, journalists, writers and others preparing longer documents, it is not usually used in the short writing associated with electronic mail and other Internet applications. The problem of interfacing speech recognition systems with e-mail software adds a level of complexity and tends to slow down the speech recognition system, making the correction of recognition errors more difficult.
Peter Fleming, a speech recognition consultant, can be reached by telephone at (617) 923-9356 or by e-mail at email@example.com