AVIOS Has High Hopes for Speech

The non-profit Applied Voice Input Output Society (AVIOS) has promoted the commercialization of speech and natural language technology for more than three decades. Now it claims the technology is seeing broader adoption and will create opportunities for lots of new applications, easing the use of ever-more-complex digital technology.

"Early applications were primitive," admits K.W. (Bill) Scholz, AVIOS president, "but pioneers laid the groundwork for today's technology. We've finally reached the point where the promise of the technology has been met. Computer speech understanding may never reach the full capabilities of humans, but it has clearly passed the level of high utility."

William Meisel, executive director of AVIOS, notes that the maturing of speech technology will trigger a "chain reaction" of applications that include a human-computer connection wherever we go. "In my book, "The Software Society," I argue that the desire to have the power of computers always available to us as a constant companion that knows us well will change what it means to be human. Technology has always had a fundamental impact on the way we live, and being able to deal with technology using human language is a major development that will propagate throughout most products and services that deal directly with people."

The organization also sees smartphones driving the adoption of speech recognition and natural language understanding, where personal assistants such as Apple's Siri, Google Now, and Samsung S-Voice respond to voice commands. "Intelligent assistants will transform our interactions with technology by acting as go-betweens between us and the applications and devices in our environments," notes Deborah Dahl, principal at Conversational Technologies and an AVIOS board member. "The uniform speech and natural language-based user interface provided by personal assistants means that we won't have to learn how to interact with every software application and consumer device individually. The intelligent assistant will interpret our naturally spoken requests and take care of the details."

Another important driver of the adoption of voice technology is in automobiles, with the need for safe hands- and eyes-free control of increasingly complex infotainment systems. "The solution to reducing driver distraction likely resides in combining a variety of driver interfaces to fit specific tasks," noted Thomas Schalk, another AVIOS board member and vice president of voice technology at Agero. "The most critical role that speech plays is text entry while driving. But holistically, complex infotainment systems require simple user interfaces that are multimodal. Based on several recent studies, we're not there yet."

Nava Shaked, principal of Brit Business Technologies and an AVIOS board member, also noted that voice interaction doesn't stand alone in a user interface: "The interesting development with spoken language understanding technologies is that they are now being integrated with other multimodal technologies to create a holistic user experience where it is possible to maximize speech for the tasks it best fits. The rules of the game have changed, and the fusion of interface technologies led by voice and gesture interaction is now a must for mobile applications looking to create a natural and intuitive experience."

Is the technology ready for prime time? Matt Yuschik, mobile services architect at CitiCorp R&D and an AVIOS board member, urges analysts lto ook at the accuracy of speech recognition in context: "Speech recognition critics view accuracy without comparison to the alternatives. Let's not forget that humans make errors, too. Dialing a phone number (on a keypad) has error rates up to 10 percent. Thank the mobile phone for the visual display and the back/erase button! For dictation, human transcription rate error is about 2 percent. And the error rate for the mini-QWERTY keypad on smartphones is between 5 percent and 6 percent, even if you are looking at the buttons -- otherwise it jumps to 18 percent to 22 percent. We humans tend to forgive errors we seem to cause. Speech technology and its subsequent errors are transparently detected and corrected by syntax, semantics, and higher-level context sensitive rules of natural janguage processing. Multimodal interactions are making transaction throughput rates even faster and more successful. And the good news is that performance is continuing to improve!"

James Larson, vice president of Larson Technical Services and an AVIOS board member, noted that traditional telephone interactive voice response and voice search systems need not have the full functionality of intelligent agents to be effective. "Intelligent agents must apply knowledge about the real world, knowledge about the user, knowledge about the current context, knowledge about the meaning and structure of natural language, and be able to determine when they can not respond to a request correctly. It will take much insight and user testing to build useful intelligent agents, they can't be build overnight."

Bruce Pollock, an AVIOS board member and vice president at West Interactive, noted the impact on customer service operations. "Speech recognition, particularly natural language speech, is helping more companies every day to improve their customer experience and lower their costs. A well-designed speech system helps to improve self-service resolution, and also helps to ensure that if a caller needs to get to an agent, they are transferred quickly and easily to the most suitable agent to get help. When integrated as part of an integrated, multichannel customer communication strategy (along with the Web, SMS, agent, etc.), speech can be an incredibly powerful tool."

TV is another area that will drive the need for flexible voice requests, with apps on smartphones perhaps acting as the remote control. Meisel noted that all the recent announcements for smart TVs include voice search to find and launch TV content.

Roberto Pieraccini, CEO of the International Computer Science Institute at Berkeley and an AVIOS board member, summarized, "The progress in computer speech understanding we have made during the past years is tremendous. After decades of activity in the field, I can see now how intelligent assistants and other applications based on voice recognition technology will become pervasive in the way we interact with machines. Science and technology need to address other robustness issues towards the achievement of truly human-like capabilities, but I am now confident we will see that happen in our lifetimes." 

Sara Basson, an AVIOS board Member and program director at IBM Research, noted an additional dimension of personal assistant technology: "Users will expect personal assistants to be smart and intuitive, asking minimal clarification questions and quickly providing the service requested. Systems like IBM's Watson are being designed to provide immediate answers to specific queries, rather than burdening users to select from a lengthy set of possible answers. A speech recognition front end is the natural and logical interface for these smart systems -- with intelligent dialogue as the next frontier."

As technology grows more complex, with more devices and more features constantly appearing, using human language to deal with the complexity is a necessity, the AVIOS board agreed. The breakthrough in speech and natural language technology can avoid consumer-facing technology hitting a wall where customers resist 'digital overload.'

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

The Evolution of Speech Recognition

Meeting the demand for natural language interaction is becoming easier—and essential.

Building Smarter Systems with Cognitive Computing

If speech technology is so valuable, why is adoption so slow?

The Intelligent System: Thoughts From Day Two of SpeechTek 2013

Mobile devices are now expected to "know" their users.