January 6, 2005
By Robin Springer president, Computer Talk
Voice Value

The Case for Augmented Speech

From making airline reservations to confirming postage rates, consumers are increasing their acceptance of applications that utilize synthesized speech. While the public can be unforgiving when it comes to the naturalness of synthesized speech, demanding that speech applications sound as human as possible, could they be identifying preferences based on incomplete information?

Consider that features and requirements of Assistive Technology (AT) tend to be precursors to those of the general public. But while mainstream applications incorporate concatenated speech, AT products produce the most profound results using formant-based technology. And, while the general public balks at the "quality" or naturalness of formant-based language, the AT community prefers it.

For example, Augmentative and Alternative Communication (AAC) Devices enable people who cannot speak to communicate. The devices range from simple picture boards to more complex Voice Output Communication Aids (VOCA) that use speech synthesis to give a voice to the user.

VOCA AAC Devices have been commercially available with synthesized speech for the last 25 years so acceptance of speech synthesis is nothing new in AT. But where do AAC Devices fit into the picture of newer generation speech technology? How much has the quality of the speech improved? The naturalness? It might be surprising but the answer is, "Not much." According to Barry Romich, chairman and CEO of the Prentke Romich Co., "We're not further than we were a couple of years ago in improving the naturalness of synthesized speech [in AAC Devices]. There is no data to show that success."

Why then have those in the disability community been spending thousands of dollars per unit for technology that is not improving commensurate with mainstream products? When prioritizing naturalness versus intelligibility, it has been found that the more critical the delivery of information, the more intelligible it must be. By contrast, when a listener is able to anticipate the information a more natural voice is preferred.

For people who cannot speak, the most important factors are to be able to say what they want to say and to say it as fast as they can. They need to be understood correctly the first time. The naturalness of the speech is secondary. While abandon rates of VOCA in some studies are as much as 30 percent, disuse is caused because clients are using the wrong device, not because the speech doesn't sound natural enough.

There are software solutions for AAC on standard PCs but most devices are dedicated units. From a reliability standpoint, feature sets required for AAC are not inherent in standard PCs. For example, a typical AAC user may mount the AAC device on a wheelchair. The device needs to withstand bumpy roads, rain, and other environmental factors and needs to accommodate accessibility switches and alternative interfaces.

And if the PC with integrated AAC crashes, where does a user take it for repair? Who is responsible for system integrity of the standard computer features versus the augmentative communication features? What about a loaner unit? If a loaner is not available the user will be without a voice until the repairs are complete.

Adding to the predicament, Medicare and Medicaid will not purchase a device that can be used as a standard computer, a significant deterrent because AAC devices can cost as much as $8,000 per unit.

New AAC devices continue to blur the lines between traditional computers and AAC. roducts including PRC's Pathfinder and Dynavox's Series 4 products are dedicated devices built on the Windows CE platform, providing users with increased computer power in addition to selection methods that are not available on traditional computers. With a wireless link between the computer and AAC device, users can control the computer by emulating the keyboard and mouse with the AAC device.

According to many in the AT industry DECTalk by Fonix is the industry standard for augmentative communication because of its intelligibility and flexibility. It has nine voices, including a child's voice, and is available in seven languages. It enables users to modify individual parameters for specific words, allowing users to express emotion and even sing.

If the technology does not work on a practical level, people will stop using it, which is why John Oelfke, vice president of embedded solutions at Fonix, believes DECTalk's success indicates intelligibility is more of a user issue than people think it is. "The goal is to improve naturalness without compromising intelligibility," Oelfke says. "We are continuing to move toward the middle."

Currently, as many as 66 percent of speech and language pathologists have clients who use or would benefit from using AAC but of the more than 200 accredited university programs for SLPs in the United States, only a few require or even offer a course in augmentative communication. The result is that professionals entering the workforce do not have adequate training to assess clients for AAC but are being expected to do so. Under-funding of assessments also makes it difficult to ensure proper product choices are being made.

Katya Hill, executive director of the AAC Institute, says, "We don't need a new gadget to make you talk better. We need to understand how language is represented on these devices." Hill believes that if professionals looked at how success is being achieved with today's technology, they would make different product recommendations.

The future of text-to-speech in AAC may promise more voices, increased control of the voices, and inclusion of variables so a user can individualize the sound of her voice.

We can continue to learn about synthesized speech by examining what works in AT, determining the factors of success, and emulating them in the mainstream market. Romich reminds us, "When people purchase AAC devices they are buying service and language, not technology."

Robin Springer is the president of Computer Talk ( www.comptalk.com ), a consulting firm specializing in the design and implementation of speech recognition and other hands-free technology services. She can be reached at (888) 999-9161 or contactus@comptalk.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

The Case for Augmented Speech

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API