Speaking of the Internet – Speech Equals Access for All

When the U.S. Congress re-wrote the Americans with Disabilities Act to ensure equal accessibility to technology by those who are visually or physically impaired, the legislation (known as Section 508 - see sidebar) potentially benefited many more people than those with disabilities. Boiled down, Section 508 states that it's illegal for a service provider to discriminate against a disabled person by refusing to provide any service which it provides to members of the public. Personal computer makers and Internet service providers (ISPs) are not exempt from Section 508 – all users, regardless of ability or age, must have equal provision.

We now have a greater ‘graying’ public than at any previous time. The proportion of population aged over 55 in the United States (and parts of Asia) is higher than ever. Older people require and deserve equal access to the Internet, the "information superhighway;" but older people may be impaired by other, ‘techno-physical’ factors.  They may have no personal computer or have limited access to a PC, and their Internet Service Provider (ISP) connection may be frustratingly slow (dial-up vs. DSL).  In addition they may have lesser technical skills or confidence, and reading a computer screen may be hampered by decreasing visual acuity. So how do (older) people with some degree of visual impairment fare with the Internet? What are their most common impediments and how can speech technology help overcome the hindrances?  What if anyone, regardless of age or ability, could surf the Internet, check email, and shop online without a computer? It’s now possible. 

How ‘Wide’ Is the World Wide Web?
The Internet is an invaluable communications medium and the most inclusive source of information and commerce ever known.  As essential as many of us find the Internet, it’s astonishing how many people have only patchy access to this resource.  Only those who can afford a PC and an Internet connection can become full members of the ‘digital community.’  In many cases (e.g. in either mobility-impaired or highly mobile situations), ease of use and having a good network connection become more important than affordability.  By most counts, the number of Internet users is small compared to the numbers that would use it if they could.  PC users connected to the Internet number about 260 million; the total number of PC users (both connected and not connected to the Internet) was about 400 million in 2003. Now technologies like voiced Internet, voice portals, and voice-based services exist to remedy the situation. Everyone can access the Internet by phone - anytime and anywhere.  Mobile users have been waiting for this for a long time, and it opens new avenues of Internet access to elderly or disabled people with special needs.

Many more people have telephones than computers; the average household in the U.S. contains a minimum of three phones. Cellular networks have allowed mobile phone use to grow at an incredible rate worldwide, especially in Asia, Europe, and Latin America. In 2003 the total number of terrestrial telephone lines reached one billion, while the number of wireless telephone subscribers exceeded one billion. According to the Yankee Group, this rapid growth in the wireless market is projected to continue, resulting in over 2.5 billion worldwide wireless subscribers by 2008. Such growth far outstrips that of PC use. So the Internet is at first glance accessible by only a small fraction of the world’s population; the majority is left out (a gap called the ‘digital divide’). In addition to being ubiquitous, phones are familiar devices to all generations; standard phones are much easier to use than computers and they work with a natural interface – speech. Incidentally, bridging the digital divide will not be complete if the ‘language divide’ is not also bridged. Approximately 80 percent of content found on the Internet is written in English. Thus people from countries like Japan and China, where English is not common, cannot view much Internet content. Fortunately, this problem can also be addressed through existing technologies.  

Decreasing Vision and Aging
By 2025 the population of people aged over 65 in the United States will be six times higher than in 1990 because ‘baby boomers’ are aging, and overall life expectancy is increasing.  Loss of vision and blindness is the disability most feared by people. What can technology provide to quell the fears of the aging and vision-impaired populations about lack of access to electronic information?

Completely sightless Internet users may use a ‘screen reader’ to read aloud the content of Web pages. Screen readers process HTML code and then decipher what needs to be read aloud, or ignored.  Such readers may be costly, or difficult to install, or incompatible with a PC’s operating system and/or existing software. In addition, navigation may be difficult because sightless users need to scroll and determine what's on the current page and on the next page. This  problem is essentially eliminated with ‘Page Highlight’ and matching algorithms (see Bridging the Divide with Phone-Internet Access) that render relevant contents when the user goes to a new page.

To take full advantage of the Internet, users with partial or poor sight need to enlarge text on Web pages. Internet Explorer users can do this by going to ‘View > Text size > Largest.’  However, text embedded within graphics cannot be resized and may cause difficulties.  People with poor vision may use a screen magnifier to enlarge the text size. Again, text embedded in graphics may cause difficulties, appearing blurry and pixilated when magnified (Moss, 2004).

Color-blindness may be another disadvantage when viewing Web sites. Approximately one in 12 men and one in 200 women have some form of color-blindness (Source: IEEE). Additionally, some epileptic users must be careful to avoid seeing flickering in frequencies between two and 55 Hz. While the effects may not provoke fits, most people have similar objections to much advertising content on the Internet.  No one is particularly amused by pop-up figures, cookies, flashing animation, or anything that jiggles and jitters noisily.

How can all these consumers – those over 55, the visually impaired and those without a PC – be guaranteed Internet access without investing in more new equipment?  The solution lies in old equipment – the telephone.

Bridging the Divide with Phone-Internet Access
Phone-based Internet access can be given using two approaches: visual browsers, and voice-browsers. A visual (micro) browser displays Web text on the small screen of a compatible cell phone, handheld computer or PDA.  Examples are WAP (wireless application protocol), and I-Mode. WAP is one way to implement a ‘wireless Web’. WAP needs a special device and re-writing the Web site using wireless mark up language (WML).  Small screen viewing, limited bandwidth, and difficulty of entering text are the major limitations of WAP and similar approaches. Besides, WAP cannot be used in an eyes-busy, hands-busy situation. As a result, the adoption of WAP technology has been slow. Today, WAP-like approaches mainly provide wireless portal and wireless services. It does not provide complete wireless Internet. Future generations of wireless devices will need greater bandwidth, larger screens, a better user interface and automated rendering of contents from the Web. Then adoption of WAP and similar devices may be expedited. Nonetheless, WAP is clearly not the solution of choice for older or blind people.

Voice browsers use only voice and any phone, eliminating the need for a special device. Voice-browsing takes three forms: voice portals, voice Internet and voice-based services. Voice portals and voice-enabling Web sites use Voice Extensible Markup Language (VoiceXML) or Speech Application Language Tags(SALT) to rewrite Web site contents. Since there are more than a billion Web sites, rewriting these Web sites using VoiceXML/SALT is extremely costly and time-consuming. Web access using VoiceXML-based voice enabling has been limited because relatively few Web sites have been re-written using VoiceXML. Limitations of WAP and the advantage of voice - which can be used in eyes-busy/hands-busy situations - have increased the attractiveness of voice-browsers. By combining automatic speech recognition (ASR) and text-to-speech (TTS) technologies with many (costly) pre-recorded responses, voice portal providers such as BeVocal and Tellme give consumers telephone access to various Web content, typically news, weather, horoscopes, restaurants, directory information, etc. The contents are proprietary: they are created, copyrighted and maintained by the companies, so users cannot directly interact with the Internet – and hence, there are no Internet browsing, surfing, or searching capabilities.

Voice-based services include basic services like Interactive Voice Response (IVR) applications and Web-based services. Web-based voice services can be developed using VoiceXML or an existing Web site. Use of an existing Web site and voice Internet (see below) is preferable because:

  1. No re-writing is needed. The same visual Web site is used for both voice and visual access.
  2. There are substantial savings in implementation time and development costs.
  3. There is natural ‘rendering’ and navigation as opposed to creating too many menus/sub-menus and questions for large Web sites or applications.
  4. Direct accessibility to any other site or application on the Internet is available, which is not possible with VoiceXML unless the linked site is also rewritten with VoiceXML.

Voice Internet at Work – A Real Application
By transforming a telephone into a high-tech tool, voice (audio) Internet technology can provide complete access to all Internet contents.  In a typical scenario, a user calls the service, typically using a toll-free phone number, and is greeted by an Intelligent Agent (IA). The IA then reads a menu of choices for selecting Internet content. Users give simple commands, such as "Go to Yahoo" or "Read my email," to get the information they want, when they want it, whether they’re out on an appointment, stuck in traffic, sitting in an airport, or cooking dinner. Users can surf any Web site, search any word(s), send/receive email, and conduct e-commerce. They can locate timely and essential information, such as late-breaking news, stock prices, traffic reports, and get driving directions.


  • User calls a number
  • Intelligent Agent (IA) greets and verifies caller
  • IA asks for Web site name using voice
  • IA accesses requested Web site page, translates and renders the Web page content to plain text
  • IA then "reads" the rendered text to user
  • User can stop or redirect IA

Figure. 1: Voice Internet: The Intelligent Agent (IA), available features, and how the user interacts with the IA and the Internet.

Automation is the key to creating a voice Internet, without rewriting Web sites. In the application, netECHO® from InternetSpeech (http://www.internetspeech.com/), the IA dynamically translates (renders) existing Web pages written in HTML, CHTML, XML, or other languages, into speech. There are no restrictions on sites that can be accessed, since all common mark-up languages (HTML, WML, XML, CHTML, VoiceXML, etc.) are supported.  The IA evaluates the site and determines which information is most useful and meaningful (a.k.a. ‘rendering’), then presents the content in easy-to-follow chunks using the ‘Page Highlights’ feature. After being given a short list of choices, callers are taken to the selected content on a linked page by simply saying which link they want. These steps are performed using the information on the visual Web page and patented algorithms, which take into consideration text contents, color, font size, links, paragraphs, and density of text to collect relevant information. Artificial intelligence (AI) techniques are used in the automated rendering process, mimicking how the human brain ‘renders’ a visual page by selecting the information of interest and then reading it. Figure 1 shows the IA, available features, and how the user interacts with the IA and the Internet.

The user must be able to interact easily with the IA and get to the desired, correct Web contents. The IA ensures this by manipulating the information extracted from the Web page. Seamless navigation is achieved through use of voice commands such as ‘Page Highlights,’ ‘Repeat Page,’ ‘Back Page,’ ‘Next Paragraph,’ ‘Previous Paragraph,’ and ‘Skip Paragraph.’

A similar Intelligent Agent (IA) may be used for business-to-business applications that let customers hear and interact with a company’s Web site, without a computer. The software lets customers get product and pricing information, check an order or account status, purchase products, or obtain product support, etc., using their voice. Voice Internet is useful for many industry horizontals (for example Call Centers/CRM, e-commerce and e-business) and verticals (e.g. banking, utilities, travel, and insurance).

As mentioned earlier, to close the digital divide, language translation is also essential. Here, again, automation plays an important role. The IA includes a language translation engine that dynamically translates Web contents from one language into another in real time.  So a Japanese speaker can ask to surf an English Web site in Japanese - the IA accesses the English Web site, extracts the content of the Web site, translates it, and reads it back to the user in Japanese.

How Well Does Voice Internet Work?
Can content really be accessed on any Web site?  Can existing Internet contents be rendered in a manner that ensures real-time delivery that is short, precise, easy to navigate, meaningful in audio, and pleasant to listen to? The answer to both is ‘Yes.’ The strength of the ‘Yes’ depends on the content of the Web page. A content-rich page with a relatively small number of links makes rendering and navigation easy since there are only a few choices, and one can quickly select a particular topic or section. If the site is rich in content, links and images/graphics, the problem is more difficult, but a solution still exists by selecting a built-in feature called ‘Page Highlights.’ The most difficult case is when a page is very rich in images/graphics and links. In such cases, the main information is located several levels down from the home page, so navigation becomes more difficult as one has to go through multiple levels. Using multi-level ‘Page Highlights’ and ‘Customized Highlights,’ the content can still be rendered well. But it is not as easy to navigate as the other two cases.  Most Internet contents fall under the first and second categories.

Looking Ahead
Use of the voiced Internet and voice-based services is growing. Recently, several carriers have deployed voice-based special services and voice portals. Local telephone carriers and Internet service providers are adding voice Internet capabilities to provide additional customer services and competitive differentiation. The next key feature to add is to allow access to the whole Internet.

Makers of appliances such as answering machines, audio gear, ATMs, robots, and anything else that should speak must also aim to comply with federal accessibility standards. Highest quality ASR and TTS should always be available and integrated seamlessly, or the new accessible service will become a disservice. A couple of years ago, an accessibility specialist at the MIT Adaptive Technology for Information and Computing (ATIC) laboratory wrote to me about his concerns for blind students.  He was "…afraid that by the time speech technology does become cheap, easily integrated and used, technology will have moved even further in the direction of an over-reliance on visual information.  The blind are always playing catch-up.  Every time a new Web technology is introduced, such as a new platform for online learning, blind people are in danger of being unable to use sites based on it…How will 3D displays and sites designed with VRML become usable by the blind?" (Caloggero, 2002)  Some encouraging answers – for both the blind and the elderly - are supplied by the developing voice Internet technologies.

Voice Internet solutions help all service providers comply with Section 508.  A voiced-based interface to the Internet also benefits many more people than those explicitly targeted by that legislation: the elderly. A federal agency may choose to delay Section 508 compliance if there is evidence of financial hardship. Voice Internet technology eases implementation of Section 508 because it ensures access to the Internet at a much lower cost.

Newly available speech technology couples a familiar, commonplace device (a standard telephone) with patented technology, which allows audio-only Internet browsing. This pairing provides a cost-effective, elegant solution to the accessibility issues being addressed by the federal agencies, and is further a boon to the large and growing graying percentage of the population.

Section 508
In 1998, Congress amended the Rehabilitation Act to require federal agencies to make their electronic and information technology accessible to people with disabilities. Inaccessible technology interferes with an individual’s ability to obtain and use information quickly and easily. Section 508 was enacted to eliminate barriers in information technology, to make available new opportunities for people with disabilities, and to encourage development of technologies that will help achieve these goals. The law applies to all Federal agencies when they develop, procure, maintain, or use electronic and information technology. Under Section 508, agencies must give disabled employees and members of the public access to information that is comparable to the access available to others.

Age-related Blindness
The leading cause of blindness for adults aged 55 and older is an eye disease called macular degeneration. It’s estimated that the disease currently affects 13-15 million Americans, and far outnumbers the incidence of glaucoma.  Unlike glaucoma, macular degeneration is less understood and until recently was regarded as incurable. Symptoms include blurring or loss of central vision, which in turn controls the ability to read, drive, recognize faces or colors, and see objects in fine detail. The number of cases of macular degeneration in the U.S. will increase significantly as baby boomers age, reaching 17 million by 2020.

Caloggero, R. (2002) Personal correspondence.
Henton, C. (2002). Making TTS Real.  Speech Technology Magazine, 7 (4), 12-16.
Henton, C. (2002) TTS – Some People Really Need It.  Proceedings of the Applied Voice-Input/Output Society (AVIOS), San Jose, CA: 79-90.
Khan, E. (2003) Voice Internet, Voice Portal and Voice-Based Services. Military Electronics conference, Baltimore.
Moss, T. (2004) Webcredible Handbook. http://www.webcredible.co.uk/.

Caroline Henton is founder/CTO of Talknowledgy and an editorial board member for Speech Technology Magazine. She’s directed projects in speech synthesis, linguistics, localization, and VUI design for Apple Computer, Sun Microsystems, Unisys, Lexicon Naming, VCS/Philips, General Magic, DEC, Fonix, Tellme, Elan, and NeoSpeech. She has 65 technical publications, and four patents.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Companies and Suppliers Mentioned