A New Read on Digital Talking Books

Speech technologies and new XML standards enhance access to educational texts.
By John Oberteuffer - Posted Sep 5, 2007
Recorded books for the blind first became available in the 1930s. Analog recordings created by volunteer readers were produced as 12-inch vinyl records—about 10 double-sides for the average book. This early production was expensive and time-consuming, so relatively few audio texts were available.

Today, digital hardware, synthetic speech software, and powerful standards for digital talking books (DTBs) have dramatically enhanced the accessibility of print-originated material. Many DTB titles, including fiction, non-fiction, and textbooks, are available through nonprofit organizations such as Recording for the Blind & Dyslexic and Benetech Bookshare.org. Serving plate-sized, long-playing records and bulky phonographs have been replaced by CD and solid-state memory media and players for talking books.

The content of many books, articles, and manuals is available as electronic text, enabling text-to-speech (TTS) software to read out the material on PCs. Works only available in print may be converted to electronic text using scanners with optical character recognition.

The Kurzweil Reading Machine, developed in the 1980s, allows virtually any printed text to be digitized and read out on a PC with TTS software. Screen reading interface software for PCs using TTS is widely available. A USB memory module offers TTS access to screen content that can plug into any PC.

The first widely available commercial synthetic speech was DECtalk, developed at MIT in the 1970s. Although it is still preferred by many visually disabled individuals for its high-speed speaking capability, a number of high-quality concatenative TTS engines, like AT&T’s Natural Voice, are now available for screen readers. These systems piece together segments of human recorded speech. TTS systems for multiple languages with different voices have been developed for a variety of computing platforms.

Today’s digital books are multimedia documents; content includes electronic text and recorded files. In addition to the evolution in hardware and software for these talking books, markup languages (MLs) that define their structure have been developed. The implementation of standard MLs offers a dramatic increase in the usability of DTBs. The ML standard for multimedia digital talking books is DAISY (Digital Accessible Information SYstem), which offers a highly functional, feature-rich format for published information.

The DAISY file may include audio files, a marked-up text, a synchronization file between audio and text, and a control file to navigate the published information. In addition, the ML metadata provides for explanations of content elements, like sidebars and images, and support for reading mathematical formulas.

DTB players and computer software provide button or keyboard control for a variety tasks. The text may be highlighted in sync with the audio (recorded or TTS), chapter titles may be listed, or the user may jump to a specific page or section in the document. Access to nonrecorded e-text information can be provided by TTS or Braille output.

Under the federal Individuals with Disabilities Education Act of 2004, the Center for Applied Special Technology (CAST) organized the creation of the National Instructional Materials Accessibility Standard (NIMAS) format, a subset of the DAISY standards. This format is
now required for all educational texts produced for state and local education agencies.

Accessibility using TTS and other modalities provided by this standard also benefits individuals with dyslexia. The physically disabled can also use the document navigation capabilities to move between sections or turn pages. For nondisabled individuals studying a foreign language, synchronized text and audio offers an additional aid to learning.

Another advantage of the NIMAS format is its potential for speech recognition-driven access to digital talking books. Speech command menus for chapter titles, indexed words, or page numbers can be easily generated from the metadata in the NIMAS file. Speech technology offers full eyes-free, hands-free voice access to these rich, standard-format, talking books.


John A. Oberteuffer is chairman of the advisory committee at Fonix Corp. and a member of the board of directors of AVIOS. He was the founder and editor of the speech industry newsletter ASRNews. He can be reached at jao@fonix.com.

