The Tipping Point of Speech Proliferation

Article Featured Image

Remember the Analog Display Terminal Interface (ADTI) phone that promised to let you surf the Web from your kitchen? How about virtual assistants, such as Wildfire, Hey Anita!, Webley, and General Magic? In the late 1990s, Web-enabled display phones and virtual assistants were a virtual market that didn’t go anywhere. However, what is old is new again, and now phones of all types let you search the Web, either visually, by voice, or in combination, and any communication device has the power to act as a personal assistant using speech as a voice user interface (VUI).

What happened to facilitate this change? Much as we hoped back then, those applications were ahead of their time. To do a refresh on how we got here, I chatted with Michael Thompson, senior vice president and general manager of Nuance Mobile, about the changes that have occurred since the mid-1990s and the developments that have caused the tipping point for mass adoption of speech as a user interface. While focused on Nuance, this history is made up of many speech technology vendors.

To set the table, in the 1990s we had traditional telephone interfaces with minimal computing power and an infant Web. Contact centers were half a decade into providing interactive voice response (IVR) for self-service and had just begun to add VUIs as an option. For speech, it took an entire server to run a speech interface that consisted of very canned, directed dialogue with a caller that took forever to accomplish anything. 

What really pushed the growth and sophistication of speech applications was an acceleration in computing power and the development of the Internet. Standards emerged, servers became much more powerful, and mobile devices started to proliferate. Simultaneously, speech technologies became more sophisticated, and we shrunk the footprint required to run them, enabling less expensive deployments and a mix of server and chip-based speech solutions, depending on the application. 

As Thompson pointed out, what really carried Nuance and others through the middle years between concept and proliferation was the contact center. As vendors pushed to speech-enable IVR and off-load agents by providing self-service options to customers, speech technology vendors, such as Nuance, advanced their core technologies. This included improvements in speed and accuracy, better grammars, more languages, and more natural sounding text-to-speech. Nuance also advanced natural language capabilities to make VUIs acceptable and useful for customers, opening the door for truly effective voice search of databases and the Web. In addition, Nuance grew the embedded side of its business—first in automobiles and then on handsets—and then continued to enhance dictation. 

Speech Is Required

However, the real tipping point for speech came when the visual screen appeared on handsets, thereby unleashing an endless amount that could be provided to the end user, including multimodal applications. This coincided with the advent of unified communications (UC), which uses mobile applications as one of the core sets of solutions for end users. For mobility solutions to be truly effective, speech technologies are required. 

Now we have reached a point of convergence in the advancement of these different technologies—including greater computing power and scalability, improved natural language, and the unification of consumer and business applications—so that speech technologies are proliferating everywhere. Even on low-end cell phones, people can search the Web, command and control applications, dial by voice, search for businesses, buy and play music, order goods and services, update corporate databases, get driving directions, and so much more. Today’s mobile phones are sophisticated devices, fully connected to the network and multifaceted for both business and pleasure. 

Besides UC-based mobility applications, contact centers, enterprise desktop applications, entertainment software, and automotive applications are just a few areas now incorporating speech technologies. 

My next column will focus on some of the new and revolutionary ways speech is being used in these areas. For example, I’ll discuss the growing use of speech-to-text, voice search, predictive text, and rich multimodal applications. Stay tuned.

Nancy Jamison is the principal analyst at Jamison Consulting. She can be reached at nsj@jamison-consulting.com.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues