Industry Leader Focus
Two hot topics in the world of speech technology these days are multimodal input and customer service. Multimodal input is the process of combining technologies - keyboard, mouse and speech, to name one combination - to form the backbone of a more natural way of communicating with the computer. Customer service, at least as it relates to speech technology, implements speech recognition to better serve a business's customers. Without question, any in-depth discussion of those two facets of speech technology will include the names of Unisys and Microsoft. The two companies have been involved in many major speech developments over the years, and both are on the brink of announcing new initiatives that could have broad impact on the industry. Because of these upcoming announcements, and because the two were scheduled to deliver keynote addresses at SpeechTEK 2000 Nov. 2 and 3, Speech Technology Magazine
questioned Dr. Xuedong Huang of Microsoft and Joe Yaworski of Unisys about their visions for the speech technology industry. [IMGCAP(1)] Dr. Huang is the general manager of Microsoft's Speech.Net Group. He joined Microsoft to launch Microsoft's Speech Technology Group in January 1993. He and his team have created core technologies used in a number of Microsoft's products. His current interests include not only how to further improve spoken language components but also how to provide a compelling multimodal solution in a wireless mobile environment. Microsoft just released a test version of its Office software that includes speech recognition commands. Chairman Bill Gates has been quoted recently as to his belief that speech and the entire realm of natural language interface will become an integral part of computing in the near future. Dr. Huang is an IEEE Fellow, an affiliate professor of electrical engineering at University of Washington, and an honorary professor of computer science at his alma mater Hunan University. He served as the vice general chair of ICASSP '98, an associate editor for IEEE Transaction on Speech and Audio processing from 1992 to 1996, and a member of Pattern Analysis and Applications editorial board. He is the co-author of two books and more than 100 papers on spoken language technology. He received his B.S. from Hunan University, M.S. from Tsinghua University and Ph.D. from the University of Edinburgh. Before joining Microsoft, he was on the faculty of Carnegie Mellon University and directed the effort in developing CMU's Sphinx-II speech recognition system. [IMGCAP(2)]Yaworski is the vice president and general manager of the Unisys Natural Language Business Initiative. Unisys provides hardware and software solutions, and the Natural Language Business Initiative is based on natural language understanding technology that lets computers recognize, understand and respond to normal human conversation. Unisys Natural Language Business Initiative provides applications, services, deployment options and technology for automating enterprise customer service functions. Yaworski has been managing the Natural Language Business Initiative since 1996 and is responsible for the strategy development and execution. Activities under his direction include development, support, services, marketing and sales. During his career with Unisys, Joe Yaworski has held a number of management positions in sales, marketing and program management organizations. Recent assignments include director of sales and marketing for Pacific Asia and Americas markets for Unisys UNIX and Enterprise Server products. In addition, he held a position in Japan with responsibility for starting and developing Unisys UNIX business through Nihon Unisys Ltd. Yaworski started his career with Unisys in 1978 as a salesman for the government, education and medical markets in the mid-Atlantic area. He holds a bachelor of science degree in business and marketing from Pennsylvania State University.
sT What areas of speech technology are your company focusing on right now?
Microsoft is working hard to provide a speech.net platform that supports multiple devices from PCs to smart phones. Microsoft also thinks that speech can provide significant values to the .NET services. We need to bridge the gap between what people want and what the industry can deliver in the areas of speech recognition and synthesis. YAWORSKI, Unisys:
The primary focus for Unisys is on the design, development and deployment of speech and natural language applications within contact centers, telecommunications and e-business/dot-com markets. Unisys provides a complete application solution that combines our tools and technology with the best speech products within the industry. We can offer these applications at the client's site or at our facilities through our voice-ASP.
sT What role will speech play in our everyday lives in 10 years? Huang:
It will empower people to access information anywhere and anytime on a number of different devices. This is particularly important for mobile computing. Speech is the only modality that can provide users with a consistent interaction model for scenarios from office to home to mobile computing. Yaworski:
Speech user interface will be incorporated in most devices over the next 10 years. We are already seeing it at the desktop, in VCRs, in automobiles and it is starting to become a more common interface for doing business over the phone. The use of SUI will increase dramatically over the next 10 years. It will be commonplace to talk to most devices, service or business; and have it do what you requested.
sT Speech is beginning to emerge in all aspects of the marketplace (consumer electronics, tele-phony, mobile connectivity, etc.) but is there one area in which speech is going to play a dominant role in the near term? Huang:
Speech will help to solve the text input problem for Far East languages, such as Japanese and Chinese. The same applies to the telephony segment because the keyboard is too "small." There are also eyes busy and hands busy situations that require speech. X.D. Huang, Alex Acero and H. Hon's latest book, Spoken Language Processing
(Prentice Hall, 2001), discussed some of these aspects in details. Yaworski:
The automation of customer service will be a key area for speech and natural language in the near future because there are very compelling business and service reasons for automating most customer service interactions. Speech and natural language will be key to successfully implementing this automation. The ability for a business's clients to call at any time, from anywhere in the world and be able to efficiently get a consistent level of service is driving business to adopt speech and natural language. The value of speech and natural language to the consumer is also very dramatic. The consumer will have the ability to get information, perform transactions or receive service at anytime from anywhere. Examples of this are the ability to get weather, traffic, stock quotes or make a reservation at a local restaurant via your cell phone from your car.
sT What is the market status for speech technology? Which areas are drawing the most attention from investors and financiers? Huang:
There is still a gap between what people want and what the industry can deliver. We should manage people expectation carefully. The industry can't deliver a speech interface that is nearly as good as a real human being. We have to solve a lot of hard research problems before we can realize the full potential of speech technology. Yaworski:
We have seen a significant increase in interest and business for speech and natural language starting at the beginning of this year. I believe that since the lights did not go out because of Y2K, organizations began to look at how to use their Y2K reserve budgets to improve their businesses. One of the key areas for improving business is to improve customer service. Speech and natural language are keys to doing this. This is especially true for large enterprises. The opportunity moving forward is to extend the marketplace from just the large enterprises to include the rest of the market. This will bring speech and natural language-based automation to the small- and medium-size firms. To do this, we must simplify the deployment for these firms by removing the complexity and expense. Achieving lower cost and ease of implementation will substantially increase the use and market for speech and natural language. We see significant opportunities to do this by having firms outsource their customer service automation to a service provider. This service provider would use speech and natural language to cost effectively automate customer service.
sT Mobile connectivity is a hot topic right now, with mobile phone and handheld device makers rushing to implement products that connect the user to the Web from anywhere at any time. What problems remain that need to be solved in this area, and where do you see this market evolving? Huang:
Microsoft Research's MiPad is a great example that illustrates what kind of problems people face in this segment. See http://www.research.microsoft.com/stg
. With upcoming 3G wireless deployments in sight, the critical challenge for mobile devices remains the accuracy and efficiency of our spoken language systems since likely it may be used in the noisy environment without using a close-talk microphone, and the server also needs to support a large number of mobile clients. Yaworski:
The biggest hurdle to overcome is not technology, but the human factor issue. The provision of Internet-based information to an individual on the run has significant benefits to the consumer and to businesses. The issue is how to present that information in such a way that the consumer will feel comfortable and want to continue to use it. Remember, most if not all of the information on the Internet is designed for a graphical user interface that does not translate well to access via the phone. The use of the phone requires that the information be provided via a customized SUI interface. If the SUI interface is done well then the caller will have a pleasurable experience. If it is done poorly, then the experience might be frustrating to the point that it may turn the consumer off. The design of the dialog for an SUI is still more of an art than a science. That is why we have invested so much time and effort in understanding how to do this well and in creating tools to assist in this process.
sT What is the future for speech as an input interface? Huang:
Speech will play a very important role in helping people access information in the multimodal environment. It will be one of the most important and widely used modalities used in the multimodal environment. Yaworski:
The use of SUI is just at the very beginning of its use and future. I would analogize it to the use of a mouse with Windows 1.0. Back then you knew that this GUI interface would be important and change computing, but I'm not sure people truly realized the impact the mouse would make. The speech and natural language interface has the same potential.
Gary Moyers is the executive editor of Speech Technology Magazine.