Speech Recognition Is Here (Finally). Now the Real Work Begins

Article Featured Image

Those who have been involved with speech technology for many years remember the annual mantra of “Speech recognition will come of age next year!”—to the point where it became a trite saying eventually applied to various other new technologies, while referring back to speech recognition. Yet here we are with speech recognition part of everyday life, suddenly accepted by the masses as quickly as the electric light bulb became part of everyday life.

Recently, several technology pundits have dismissed new speech-enabled solutions because, to paraphrase, “pretty much everything is just based on a Google or Alexa API.” Rejecting solutions as not innovative because they are not fully provided by a single vendor is misguided. Just like electric light, present-day speech-enabled technologies don’t deliver their benefits as a result of a single vendor; expertise is applied to add value by various parties throughout the life of these solutions. But as speech moves to the enterprise, buyers and providers will need to be cognizant of these arrangements and their implications for data security.

During the long period where speech recognition was tapped as the next great thing, it was focused on speech to text via statistical analysis of phonemes (please forgive the shorthand description). Though it seems silly today, back in 1986 IBM announced that its Tangora could actually anticipate upcoming phonemes through the use of the Hidden Markov Model, resulting in predictions of ubiquitous speech recognition right around the corner.

Today’s acceptance of speech recognition is the result of adding contextual understanding with highly tuned statistical modeling to overcome the obstacles in language that confound humans. A quick view of any social media will demonstrate that homonyms such as “there,” “their,” and “they’re” apparently require high-powered cloud computing to get right. 

With the rise of Amazon’s Alexa, Google’s Assistant, Apple’s Siri (which is based on Nuance’s speech recognition), and Microsoft’s Cortana, contextual understanding leapt forward due to the billions of utterances constrained in boundaries such as maps and directions, computer commands (e.g., “open Word” or “send text message”), automotive commands, etc. The result of cloud computing’s massive power, contextual understanding has made speech recognition quick, convenient, and helpful.

But to get back to the opening point: What is a buyer or provider of speech-enabled technology to do? How can you protect yourself from rapid changes in alliances or terms and conditions with these global behemoths? And what about the potential for abuse of personally identifiable information?

All of the previously mentioned vendors are either planning to or already offer enterprise integration, and several other traditional enterprise software vendors are joining them. For the most part their contracts follow those of most enterprise cloud computing offerings, but many holes are conveniently left to the point where common sense should be a guiding factor in protecting yourself. For example, no personally identifiable information should be shared outside of your organization if you are a buyer of the end solution, and if you are using these third parties in your speech-enabled solution, you should create an additional Chinese wall to protect end users in case your customers haven’t fully disassociated PII from speech processing. 

Furthermore, given where the sentiment of the North American citizenry is heading when it comes to data privacy, using the European Union’s GDPR directive might add a healthy dose of future-proofing to your speech-enabled solution, particularly one that relies on cloud processing.

On a more positive front, speech technologies are now changing at a rate never thought possible, so keep your eyes on opportunities to apply new functionality never dreamed of even a few years ago. And always keep in mind author Douglas Adams’s observation, “We are stuck with technology when what we really want is just stuff that works.”

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles


We present the thinkers and innovators who are creating new tools and approaches for speech technology—and fostering the next generation of talent. In this installment, we talk to Ilana Shalowitz from Wolters Kluwer Health.


We present the thinkers and innovators who are creating new tools and approaches for speech technology—and fostering the next generation of talent.