September 2, 2021
By Kevin Brown enterprise architect, Miratech.
Inside Speech

Speech Technology's Invaluable Assist

The blazing fast progression of speech technologies into everyday life is all too often taken for granted by many outside of the speech technology industry. Conversely, the leap made by speech technologies over the past decade is shocking to many of us who were toiling with the technology over the previous two decades.

And perhaps no industry sector has benefited more than assistive tech.

People with sight disabilities were early adopters of text-to-speech technology, but it came at a high cost and in an inflexible form factor. A report by the KTH Royal Institute of Technology in Stockholm in 1987 stated that “approximately 500 text-to-speech systems have been delivered for the visually impaired,” but then went on to declare this about speech-to-text: “These early optimistic projections have then been revised and moderated. Especially the expectations on speech recognition have been exaggerated. The key concept of a voice-activated typewriter will still take decades to develop in its general form.”

In 1990 at TNT Express Worldwide’s Sydney office, I witnessed the implementation of JAWS (Job Access With Speech), which was a MS-DOS text-to-speech application that allowed a blind employee to provide invoicing services at a speed faster than her coworkers. This experience raised my awareness of what expanded possibilities lurked in the future for assistive speech technologies.

Throughout the ’90s, assistive speech technologies slowly progressed, mainly those based on text-to-speech. But in 1997, Dragon Naturally Speaking came to market and was arguably the first mainstream consumer speech-to-text application that worked well enough to gain significant adoption. Physically disabled people now had an affordable means to easily dictate communication.

Ten years later, the iPhone was released in June 2007. Android followed a year later, and soon after Google voice search was available on both mobile platforms. This was the first time that the combination of computing power in a handheld device coupled with network-enabled speech recognition provided a platform for developers to build assistive functionality at a reasonable price.

In 2011 Apple turbocharged speech recognition growth by introducing the voice assistant Siri, fueled by Nuance’s speech recognizer coupled with the newly acquired Siri AI spinoff from SRI International. Google followed in 2012 with Google Now (which morphed into Google Assistant), with Amazon launching Alexa in smart speakers in 2014 and later on mobile devices. The battle was on for speech-enabled assistants.

Using these platforms, many third-party developers have rolled out smart homes capabilities for the mass market that are also enormous time and effort savers for those with physical challenges. Using your voice to order food, adjust a thermostat, turn lights on and off, adjust blinds, answer a doorbell, lock doors, or turn on security systems seems like a slick trick that saves a few seconds. But these activities can take much longer or are simply impossible for those with physical disabilities.

One interesting case is Furbo, which is a video camera, microphone, and speaker combined with a treat dispenser for pets. Originally created for pet owners who wanted to stay in contact with homebound pets from the office, it took on an important role in the physically challenged community: as an excellent way to continue the training of assistive pets. It is fully integrated with Alexa, and additional skills are constantly being added.

Even small functionality increases from speech technology can literally be life-changing. An example is the open-source xDrip+ Android mobile application for blood glucose monitoring. The app can speak glucose levels and alerts for custom trends and levels, allowing diabetics to address trends before they become too serious.

Though mobile speech-enabled assistive apps are continuing to flourish, with several hundred in both Google’s Play Store and Apple’s App Store, plenty of stand-alone products are also being developed to make life easier and more accessible for those of us with disabilities. An example is OrCam from Reading Helpers; the OrCam Pro product mounts on eyeglass frames and provides reading capabilities via text-to-speech without internet connectivity, helping those with low vision or dyslexia. It also provides color identification (color blindness in some business roles can be a major impediment), currency identification, product identification, and facial recognition. If you rely on this functionality throughout the day, this form factor is much easier and more convenient than pulling out a mobile phone and pointing.

For those in the speech technology arena, developing assistive capabilities are a wonderful means to help others, and you are likely to have fun doing it!

Kevin Brown is a customer experience architect. He has more than 25 years of experience designing and delivering speech-enabled solutions. He can be reached at kevin.brown@voxperitus.com.

Speech Technology's Invaluable Assist

Aircall Acquires Vogent

Grok Voice Mode Comes to Apple CarPlay

Krisp Launches VIVA 2.0, an Infrastructure for Voice AI Agents

DomoAI Launches TTS and Integrates OpenAI's GPT Image 2.0 in Talking Avatar Workflow