July 30, 2018
By James A. Larson program co-chair, SpeechTEK 2021
Forward Thinking

8 Ways Advances in Speech Recognition Will Affect Our Lives

To borrow from noted technologist Bob Dylan, the times they are a-changin’ in the speech industry. Speech recognition has improved dramatically over the years and will continue to evolve and support new applications. We’ve come a long way from recognizing discretely spoken words to continuous speech recognition, from recognizing words in a small vocabulary to recognizing words in a very large vocabulary, from speaking simple commands to formulating complex queries, and from speaking in carefully controlled environments to talking on cell phones in noisy public areas.

We’ve indeed come a long way, but to paraphrase that other great sage, Dr. Seuss: Oh, the places we’ll go. Here are some developments and technologies that are coming soon or may already be upon us:

Goodbye, passwords: Those hard-to-recall strings of digits, characters, and symbols will be replaced by speaker identification at the beginning of a dialogue and periodic speaker authentication during the dialogue. By combining speaker identification with other forms of user identification such as facial recognition, fingerprint identification, and challenge dialogues, fraudsters will be thwarted from hacking sensitive systems.

Emotions matter: Dialogue systems will listen for words in the user’s speech that convey emotion and examine a user’s speech for characteristics that provide clues for what the user is feeling. Interactive voice response (IVR) systems will transfer users from automated agents to human agents when the IVR system detects user frustration, or will offer users additional products or bargains when users hesitate to make a choice.

Karaoke made easy: Advanced speech processing systems can recognize when a person sings off-pitch and correct him with the appropriate pitch. A karaoke system equipped with this technology could listen to singers and replace off-pitch vocalizations with on-pitch vocalizations. This would prove very useful for those tone-deaf crooners among us.

Voice reveals your story: There are several user traits that can be detected from voice characteristics, including age, gender, height, and weight. Obtaining descriptions from voice calls and voice recordings could prove useful in police work. In addition, it is possible to screen for any of several medical conditions that may affect the muscles in the user’s throat or the nerves that control those muscles. Patients could be screened for early onset Parkinson’s or Alzheimer’s diseases. Rather than spend the night in a hospital, users can be screened for sleep apnea during a five-minute phone call.

A verbal “walk the line”: While a breathalyzer measures levels of alcohol, it can’t instantly assess whether a speaker is impaired, whether by other drugs or due to a medical condition. Speech recognition systems can detect a speaker’s slurs and mispronunciations due to alcohol, drug use, or medical issues. When pulled over by police armed with speech recognition, a driver may instantly give away drug or alcohol use through just a verbal exchange.

Lies, lies, and more lies: An automatic speech recognition (ASR) system could measure changes in stress, a component in detecting lies, which raises the possibility of having speech recognition double as a lie detector, to determine when a speaker is being deceptive. Perhaps ASR systems could be used to detect, among other things, when politicians are attempting to mislead the public.

Talk with the animals: Some pet owners can recognize different kinds of growls and barks. Could speech technology recognize when a dog wants to eat, play, go outside, be petted, or have you go away? While it is doubtful that pets can discuss complex existential questions, they could make their basic wants known to their owners.

The butler did it: In-home devices, such as Amazon Echo and Google Home, act as butlers by answering questions, controlling our environment, and performing tasks. Some devices, such as the Jibo robot, will interact with homeowners using natural language and gestures.

These are only some of the developments happening now or soon as new applications take advantage of large voice databases, machine learning technology, and other advances in speech recognition. Buckle up and get ready for accelerating changes in speech technology that will affect nearly every aspect of our lives.

James A. Larson is the program chair of SpeechTEK 2018 and an adjunct professor at Portland State University. He can be reached at jlarson@infotoday.com.