Editor's note: This is Part 2 of a two-part series in which Peter Fleming explores speech and the Internet. Part 1 appeared in the September/October issue of Speech Technology Magazine.
The joining of Internet capability with television has opened up new possibilities for speech as an interface. Video and audio telephone connections are now possible over conventional telephone lines as well as over larger, faster fiber optic and other fast connections. "Interactive television," which could also make use of speech recognition, is as yet a largely unexplored technology. To compete with cable companies, some telephone companies are now offering bundled television by satellite. Internet telephone calls are becoming more frequent, either through Internet voice chatting (available through many services) or through an increasing number of Web sites devoted to Internet telephony. One Web site, for example, is "DialPad.com," which allows not only computer-to-computer phone calls, but also allows the user to dial free conventional calls unlimited within the United States. Proper microphone and sound level adjustments are sometimes necessary - but one still needs a good connection to the Internet for the call quality to be good. The unpredictability of the quality of Internet connections, especially at busy times of day, severely compromises what can be accomplished in this way. However, fiber optic connections available through cable television/Internet companies may partially solve this problem, especially if both parties have fast line capability. Satellite connectivity may also prove to be the answer. On the horizon is a generation of handheld communication devices that not only will check e-mail through conventional and cellular telephone connections, but also will allow inexpensive long-distance Internet telephony through calibrated devices designed for this purpose. Another possible interesting area of use for Internet speech recognition applications, for the future, is banking. Money is being moved electronically as more banking occurs on-line. Stock trading and conventional banking occur through Web sites, which can be controlled by voice (in principle) and which may become an increasingly profitable area in the future. A related application where speech has potential is personal and corporate finance, where data may be entered by voice as well as by keyboard or touch screen. A possibility is that these modalities may be integrated to work together as a unit in the future. People are now able to access health-care databases from their homes or offices, allowing diagnosis to take place at a distance using cameras attached to computers. Medical records are being stored in digitized form with speech recognition preparation, and there is the potential of sending other medical information by the Internet. Even X-rays, tissue biopsies and other visual information can be sent rapidly long distances via the Net. In the legal world, documents can now be prepared using speech recognition. The forms are digitized rapidly and sent via the Internet at high speeds to connect remote offices. "Intelligent machines" with which the user can interact, ask questions or give commands, are already in the research phase. These machines could carry out household functions such as turning appliances on and off, checking heating systems, checking on children, monitoring house or garden temperature, industrial parameters, electricity, industrial productivity indices, geographic whereabouts of individuals, etc. All this can be done through speech recognition interactions between a person and a machine and carried out through long- distance Internet connectivity. Artificial intelligence mavens have long been working on robotic devices and other interactive tools. The day may not be too far away when one may inquire whether the laundry is finished, the house is clean, the shopping is done and the bills are paid, through speech recognition interaction with a computer over the Internet. Users could dictate needs and carry out activities of daily living through speech recognition on the Net. Such ideas are not far-fetched at this point - the technology to carry them out is already available and being marketed.
Peter Fleming, a speech recognition consultant, can be reached by telephone at (617) 923-9356 or by e-mail at firstname.lastname@example.org.