Helping Developers Say Yes to Speech

Developers have struggled with speech for years, trying to find how to effectively integrate speech into existing and projected computer and telecommunications projects. In the last year, Unisys has emerged as an important player in the speech industry, perhaps more as a company that enables speech development than as a direct provider. Unisys is actively pursuing alliances with other companies to get their Natural Language products widely deployed in the developer community. Additionally, the company’s Natural Language Speech Assistant Toolkit is being used by numerous developers to speech enable applications. The toolkit allows developers to rapidly prototype the "listen and feel" of a real world spoken language application before writing computer code or choosing a recognizer. In the past year, Unisys has forged strategic alliances with such key speech recognition providers as L&H, Philips, and most recently, Lucent Technologies. In October, Philips signed with Unisys to allow the Philips ASR engine to be integrated and resold with the Unisys Natural Language products, becoming the first member of the Unisys Reseller program. In December, Lucent and Unisys signed a three-year agreement to integrate and resell their speech technology products in a software package combining Lucent’s text-to-speech synthesis and ASR engines with the Unisys toolkit to target speech developers for use on interactive voice response and telephony platforms. Both deals are a part of the Unisys strategy of "taking the Ph.D. out of speech development," as Joe Yaworski, vice president and general manager of the Unisys Natural Language Understanding business unit, has expressed. As Unisys takes on greater importance for speech developers and resellers, we thought our readers would be interested in learning more about the company’s plans for developing and advancing speech. For that, the logical person to speak with was Joe Yaworski. What sparked your own interest in speech?
I really came at this with more from a business and marketing background than from a speech background. Years ago when I first saw some of the concepts of speech recognition outlined, I said "Everyone who uses a phone will want to use this." That hasn’t happened yet. What will it take for speech to become more broadly accepted? Speech needs to be in several environments. It has to be in call centers, phone networks, desktops and the Internet. As the technologies that empower those environments merge, speech needs to be in a wide variety of applications. Until very recently, the time it took to develop a speech application was a real bottleneck. Developing a speech solution for an IVR system in a call center takes some doing. First, the caller, being on the telephone, can only receive verbal feedback. This means that the "listen and feel" needs to be right for the caller to feel comfortable with the interaction. Second, the caller does not know who is talking, which means the speech recognizer needs to perform in a speaker independent grammar constrained mode. This mode is dependent on being able to predict what the callers will typically say in response to a prompt. Third, it requires a large vocabulary speech recognizer capable of having hundreds or thousands of words active. This allows for a more intuitive, caller driven interaction. In this way, a caller can respond in their own way to a prompt. Take as an example: Prompt – "Is this information correct, please say yes or no." In creating the dialogues for the Natural Language Speech Assistant, we found that there are over 70 ways to say yes, and almost a 100 or so ways to say no. Our approach lets the IVR system step up to speech, which has a lot of advantages over the touch tone key pad for the telephone. It isn’t just that speech is more efficient for the caller than touch tone key pads. With speech, call center management can create and respond to many more open ended questions than when the customers are punching in keys on a telephone key pad. An example of this would be requesting something from a list, such as a State. The touch tone approach would be something like this - "press 1 for Alabama, press 2 for Alaska, …..press 50 for Wyoming." This would not be a pleasant experience for the caller. With spoken interface the caller would simply say " I want Pennsylvania, please." This is easy for the user. I assume you are pursuing such a wide variety of alliances to be able to give your call center customers a variety of speech recognizers from which to choose. What reasons might Unisys have for recommending one speech package over another to a particular customer?
Some have better language coverage applications. Some do specific languages better than others. Some handle numeric data better. With that, some are better with longer numbers. Some may handle dialects better than others. In general, to cover North America, given the language, application requirements, and accents consideration, may require two to four different speech recognizers. The NLSA allows for the developer to chose the recognizer or recognizers that best handle the application and deployment needs. What does the future hold for the commercial application of speech technology?
There is a growing market demand for natural speech applications, and I think you will see that become evident over the next three or four years. Speech systems will replace touch-tone dial systems for applications such as phone banking. Speech is an opportunity that dramatically expands the potential for new types of self-service applications. It will lead to business cost savings, but also will expand sales opportunities for call centers by making IVRs truly interactive, allowing for open ended discussions for companies and their customers. To this end we have focused on making the development and deployment of speech applications easy for anyone doing touch tone applications today
SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues