Enabling the Disabled
Assistive technology has come a long way in serving a sometimes marginalized population. A host of innovative products and services are now available to aid those who cannot speak, hear, see, or move, with offerings that run the gamut from voice-controlled wheelchairs to Roger Ebert's synthesized voice, all thanks to advancements in speech technology.
From Assistive to Main Street
Many of today's mainstream speech-enabled offerings were borne from the assistive market.
"A lot of times, people with disabilities are kind of like the canaries in the coal mine," says Robin Springer, an attorney and president of Computer Talk, a consulting firm specializing in the design and implementation of speech recognition and other hands-free technology services. "The technology starts because of people with disabilities—it starts as assistive technology, which is how speech recognition also started."
That was the case with Nuance Communications and its popular Dragon speech recognition software.
Introduced 16 years ago,"Dragon's roots are as an assistive technology tool," says Erica Hill, product manager at Dragon. "Our initial customers were people who had a physical need for a speech interface for the PC because they couldn't access their computer through the traditional method of a keyboard or mouse."
Dragon provides a voice interface to the PC through Dragon NaturallySpeaking and Dragon Dictate for Mac, enabling users to dictate straight text and commands and control a computer with simple voice commands.
About 10 to 15 percent of Dragon customers use it because of some sort of physical disability, including visual impairments, loss of limbs or limb usage, paralysis, multiple sclerosis, and amyotrophic lateral sclerosis (ALS).
Dragon technology is also used in products for the visually impaired, such as J-Say Pro, which provides access to PC applications. J-Say Pro brings together Dragon NaturallySpeaking Professional version 11 and JAWS (Job Access with Speech) software from Freedom Scientific. JAWS is an accessibility solution that reads information on a PC screen using multilingual speech synthesizers. Another product for the visually impaired, MagniTalk, provides direct speech access to a zoom text user interface with NaturallySpeaking Pro voice commands.
Nuance's disabled users also include those with learning disabilities such as dyslexia, dysgraphia, autism, and ADHD. "There are different learning challenges that present challenges in terms of transferring what's in [users'] minds onto the screen," Hill says. "For young people, for example, it's much easier and natural for them to don a headset and be able to simply speak their mind instead of having to worry about the mechanics of writing. We've found that Dragon has unlocked the door to success for a lot of these kids."
Reading the Web With Your Ears
At a recent TechCrunch Disrupt conference, start-up SpokenLayer released a public beta of an app that takes written Web content and transforms it instantly into audio. The app takes the body text of an article and passes it to a human to read, record, and publish in the company's system, or employs advanced text-to-speech synthesis to read articles at sub-second turnaround times to meet instant demand for content which has yet to be recorded by a real human voice.
According to the company, the app uses both text-to-speech and human voices because neither method works at scale.
"There's far too much content produced in text in any given day to pass through human readers in a scalable way," says SpokenLayer founder and CEO Will Mayo. "Conversely, although text-to-speech has come an incredible distance, some people are more sensitive to the rough edges that still persist than others. Merging the two possible solutions by analyzing the popularity of content, what we've created is a system in which popular articles are quickly read by human voices, and the long tail of articles can be delivered on demand. Essentially, we've combined the strengths of both solutions, and removed the weaknesses as a side effect."
SpokenLayer has partnered with several content providers, including the Associated Press, The Atlantic, TechCrunch, AOL Tech, and The National Journal, among others. Right now the app is only available through the Apple store, but Mayo says the company is working on an Android version.
"While we're working with a lot of new publishers, we see the next version having users select the content they want as opposed to us selecting it for them, and providing a Pandora-type experience for them," Mayo says, referring to the Web-based personalized radio.
Building a Better App
Data By Design has launched a new app for the iPad, My Care Share, an enhanced version of The Fat Finger app, which is a text-to-speech program for people who have trouble selecting keys from a standard keyboard. My Care Share uses iSpeech cloud technology that combines text-to-speech and automated speech recognition technologies with application programming interfaces (APIs) and software development kits. The app is for those with disabilities such as apraxia, dysarthria, dysphasia, and autism, who may suffer from limited fine motor skills or vision problems.
As an example of how My Care Share works, suppose the user of the app wants to select the letter "A" from the virtual keyboard, but selects the "W" accidentally. A larger pop-up menu reveals letters from the quadrant the user selected. The user would select a letter from this larger menu; the program would speak the letter as it is inputted into an intended phrase. My Care Share also allows users to communicate with their caregiver by selecting a photo with an associated phrase. A picture of the bathroom would read aloud: "Would you mind taking me to the bathroom?"
"My wife, a speech pathologist, noticed that her stroke patients were frustrated with their inability to key information into their iPads for apps they were currently using," says Data By Design founder Robert DiGiacomo. "This app was designed to assist those who have difficulty typing into a text-to-speech program. The user would just simply select a photo from the left column and a clear voice would then read the [corresponding] phrase aloud. You can download your own photos and create your own phrases, or choose from the many photos and phrases pre-installed for you. You can sort the phrases in the order that you feel better fits your needs."
Speech and Mobility
According to GeckoSystems International, approximately 2.2 million people in the United States use a wheelchair for everyday activities, and approximately 15 percent of those use power wheelchairs.
Thanks to new technology, wheelchair users can now use their voice to get around. The company's GeckoChat verbal interaction software is integrated with its SafePath wheelchair navigation solution for use by quadriplegics and others who have control of speech but are unable to manipulate a joystick control.
Those who use both GeckoChat and the company's SafePath mobile robot solution can enter an unknown indoor space and safely explore it via voice. Another SafePath solution, GeckoNav, is used to avoid obstacles and achieve endpoints. The user may mark endpoints and waypoints by the system so that multiple verbal commands may be replaced by a single command.
GeckoChat uses speech recognition/synthesis protocols available in 13 languages. It was created in 2003 for GeckoSystems' elder care and personal assistance robot, the CareBot, to provide medication and scheduling reminders and record compliance for the care receiver.
Roger Ebert Speaks Again
Two years ago, film critic Roger Ebert regained his voice, thanks to speech technology from Edinburgh, Scotland–based CereProc (short for cerebral processing).
The company was founded in 2005 by Matthew Aylett and Nick Wright of the Edinburgh–Stanford Link speech research fund. They realized the latest advances in speech synthesis meant artificial voices could be made to sound "real." Working with Chris Pidcock, a world-class speech synthesis expert, CereProc developed advanced text-to-speech technology.
An early project was the Bush-o-Matic, a video and speech program allowing users to input text into a screen featuring the head of former president George W. Bush. The talking head synthesizes the inputted text into Bush's voice.
Ebert, who lost his voice following several cancer operations, approached CereProc, thinking that if the company could reproduce the former president's voice, they could do the same for him.
"What's happened in the past is that people have tried to make voices sound very, very neutral, which is fine for some applications," Aylett says. "But now we want the voice to be 'gaiting' in some ways, and that means recording material which gives a better sense of the person."
In 2010, the company began working on what they dubbed "Roger Junior." The company's sophisticated software broke down Ebert's voice into phonemes by listening to audio from Ebert's CD commentaries dating back 10 years.
"You put stress on different words," Aylett says. "This affects the way sounds are produced. What makes it more difficult is that different words are affected by the sounds next to them and the context they're in. You need to record quite a lot of samples of different sounds to produce something that sounds natural. Basically, you take a lot of speech, cut it up into tiny pieces, and rearrange it [to] produce new sentences. That's what our system does, but automatically."
CereProc is working with former New Orleans Saints player Steve Gleason, who has ALS, to collect voice samples and create a text-to-speech program with accessibility software. After his diagnosis, Gleason started a video diary for his son.
"I began keeping a video journal library where I in some way discuss topics, to essentially share my heart and my mind with my son. It's possible we will never have a voice-to-voice conversation, but it's impossible that we will not have a conversation," Gleason said at a United Nations meeting. "I began working with CereProc recording my voice to a voice bank. We have created a synthetic voice that sounds like my own voice. I hope you like it because you will be hearing from me for decades to come."
Aylett says that CereProc is working with other individuals, but at this point it's trying to automate the entire process so that the system becomes more cost efficient and can reach a wider audience.
Giving the Deaf a Voice
There are no precise figures on how many hearing-impaired people there are in the world. According to Gallaudet University, the largest secondary school for the deaf in the U.S., hearing-impaired people have not been specifically counted in the U.S. Census since 1930. The last census of the U.S. deaf population was privately conducted in 1971. For figures since then, only estimates are available. Based on demographic surveys, the school's Research Institute believes that there are a little over 38 million people in the U.S. categorized as having "hearing problems." That market, which had once been woefully underserved, has recently seen a flurry of new solutions and products.
Originally developed as a productivity-enhancing tool for voicemail management in the enterprise, Mutare's giSTT speech-to-text application has found a following among those who are unable to use voicemail because of hearing impairments. Mutare's giSTT automatically delivers a text transcription of voicemail messages to users' email inboxes or mobile devices so they can read, rather than listen to, the content of the message. The Mutare giSTT application works with virtually any enterprise voicemail platform and any email system, requires no desktop client to install or maintain, and is available at an extremely minimal cost.
Ed O'Brien, Mutare's chief information officer, who is hearing impaired himself and a member of the giSTT product development team, says the solution is a win-win for the hearing and hearing impaired alike.
"We get feedback constantly from our customers who tell us they can't imagine ever going back to traditional voicemail," he says. "But for the hearing-impaired user, the accommodation is especially significant. And one would be hard-pressed to consider the cost of less than one dollar per week anything but reasonable."
SpeechTrans, a start-up provider of app-based multilanguage translation, recently launched SpeechTrans Ultimate for Hearing Impaired. The solution enables the hearing impaired to conduct real-time, two-way conversations in English, Spanish, French, Japanese, Italian, German, and Mandarin Chinese without the need for sign language or an in-person translator.
With the app, each person can speak into the user's mobile device, and have the translated text automatically appear. Type-to-type translations are also available for situations that require quiet, or for those who have trouble speaking because of a disability. As it is difficult for hearing-impaired individuals to determine when someone ends their part of the conversation, the hearing-impaired version can automatically detect when a person stops speaking, and immediately begin the translation.
The app features Facebook chat integration, which allows users to type in their native language and have the other party receive translated output on a SpeechTrans-enabled iPhone or iPad, or directly as text on a computer. It comes with 1,000 preloaded transcriptions, compared to the 400 available in the SpeechTrans Ultimate app. Every usage that requires voice recognition counts as one "transcription." Text-to-text and text-to-speech transcriptions are unlimited. An additional 100 transcriptions are available to users of Facebook's "Suggest Friends" feature within the app. Pricing for the SpeechTrans Ultimate Hearing Impaired app is just $50. It is available in Apple's App Store and has just been released in an Android version.
"This began when we got an inquiry from a customer in Spain who was hearing impaired and thought our technology could be tweaked to be like closed captioning on television, but could be used in real life," explains Yan Auerbach, cofounder and COO of SpeechTrans. "Instead of having to get a pen and paper, this facilitates communication happening almost instantaneously without the need for sign language."
Moving from the practical to the seemingly fantastical, a team of Ukrainian students recently won Microsoft's Imagine Cup competition with their invention of gloves that can translate sign language into speech.
Dubbing themselves the quadSquad, the team demonstrated Enable Talk gloves, which are able to translate signs through a text-to-talk engine connected to a smartphone.
The gloves capture hand movements and translate the sign to a mobile device. An application matches the pattern with stored signs and plays the sound for that sign. Enable Talk has 11 flex sensors, eight touch sensors, an accelerometer/compass, and an accelerometer/gyroscope built into the gloves. The sensors gather raw data and transmit it to a microcontroller, which normalizes the data and transmits it to a mobile device through a Bluetooth module, where the signs are recognized and matched to signs and patterns. When the pattern is recognized, the text equivalent of the sign is generated. Using Microsoft speech and Bing APIs, the sound is played via a mobile device's sound system.
The quadSquad estimate that initial start-up costs will be $40,000, the base price of the prototype will be $20 per glove at mass production, and the final retail will be approximately $100 per glove.
To the average user, solutions that originate from assistive technology may simply translate into hours of time saved in completing everyday tasks, but for the disabled,they mean equal access to worlds they may have previously been unable to enter.
Staff Writer Michele Masterson can be reached at firstname.lastname@example.org.
giSTT provides voicemail access for the hearing impaired.
Hundreds of people shared their stories of how speech technologies help in their personal and professional lives.
The company's JayBee product transforms predictive text phrases into voices
Custom-designed speech recognition app gives the hearing impaired freedom to engage in personal multi-language conversations.