August 25, 2003
By Judith Markowitz Principal - J. Markowitz, Consultants
Forward Thinking

I See What You Are Saying

There's no doubt that speech recognition is an assistive technology. Most of us are familiar with the use of dictation and voice-controlled desktop navigation tools by people with repetitive stress injuries (RSI). I've also seen a myriad of voice-activated implementations for people with limb paralysis and weakness that have included hospital beds, wheelchairs, environmental control systems and a complete feeding system (it was experimental and hadn't resolved problems related to the administration of liquids). There are also command-and-control systems for people with severe visual impairments, such as a voice-activated photocopier developed at Pitney Bowes. Now the American Sign Language project at DePaul University's School of Computer Science, Telecommunications and Information Systems is building a system that will guide deaf people through auditory minefields. "It was originally proposed by a deaf graduate student who suddenly found herself surrounded by angry, gun-toting airport security people," explains Professor Rosalee Wolfe, who heads up the project. "They became alarmed when she failed to stop when asked - even when they shouted at her. She was terrified. Every deaf person who travels has at least one story like this." In fact, it's remarkable how much airport security depends upon effective verbal communication. In the few minutes I spent at security screening in one airport I heard the following: "Wait! Don't go through until your bag has gone." "Go through again." "Don't touch your bag." "Which one of these is your bag?" None of them is a good candidate for a printed sign yet they are all vitally important. It's also not feasible to rely on lip reading for such critical information. When I was a speech pathologist I taught lip reading to hearing-impaired adults. It's hard. Only around 10 percent of spoken language is visible. Even if you think you may have recognized a familiar pattern you still don't know how it fits into the whole instruction. VISIBLE SPEECH The goal of the DePaul researchers is to capture spoken instructions and convert them into the fourth most widely-used language in the United States - American Sign Language (ASL). "This involves transforming verbal communication into an animated visual format," says graduate student Sunny Srinirasan. "It's really a machine-translation project where the translation is from sounds to hand movements and positions." In fact, ASL is a full-fledged language with its own semantics, syntax and visual-phonology that is descended from ASL's French (not British) origins. "Signs are not undifferentiated units, either," adds graduate student MaryJo Davidson. "They are made up on features called cheremes - from the Greek word for hand." The basic features that differentiate signs in ASL are: hand movement, hand orientation (e.g., palm up, slanted), hand shape (straight, curved) and non-manuals like facial expression. So, the same hand movement and shape with a different hand orientation is an entirely different sign. For example, the words name and sit (shown in the pictures) have identical hand shapes and movement but differ in hand orientation. The same thing happens in spoken language. For example, in English the phonemes 's' and 'z' differ only in whether you use your voice while saying them. Yet, substituting an 's' for the 'z' phoneme at the end of the word his changes it to hiss. COMPUTERIZATION Although the original impetus was to create an assistive tool for airport security, the DePaul researchers are actually trying to build a generic tool that can be used in a variety of structured business and educational contexts, including ASL training for healthcare workers. To accomplish this goal, Wolfe's team is using speaker-independent input to a structured grammar that defines the expected variability in the input utterances. The input is converted into an inter-lingua - an internal representation that can be used to generate ASL sequences. Then, chememes and other elements are used to construct the signs that are used by an animated figure named Paula (see pictures). Like speech synthesis, making ASL movements smooth and natural looking is a challenge. "In order to make the movements and positions seem natural and flow into each other you have to have much more detail and precision than you would in normal animation," explains Davidson. In addition, Paula has been created with unusually large hands and face to enhance ease of reading but, unlike animations by other research groups, she has not been given extra joints in her fingers and thumbs. AIRPORT SECURITY In addition to developing the core components of the system, Wolfe's team is designing modules for five airport-security operations. Here are some examples:
General Instructions - "Only ticketed people beyond this point."
Procedures - "Take your laptop out of its case."
Wanding - "Put your arms up."
Hand Inspection of luggage - "Don't touch your bag."
Finishing - "You can go now." The American Sign Language project has generated interest among local and federal authorities but, as with many government-related projects, the wheels move slowly.

Dr. Judith Markowitz is the associate editor of Speech Technology Magazine and is a leading independent analyst in the speech technology and voice biometric fields. She can be reached at (773) 769-9243 or jmarkowitz@pobox.com.

I See What You Are Saying

SoundHound AI Is Bringing Amelia 7 Agentic AI to Vehicles, TVs, and Smart Devices

Sensory Launches Smart Wakewords

Plaud Unveils Plaud NotePin S and Plaud Desktop

CEVA Partners with Sensory