From E-Commerce to V-Commerce: The Voice Portal Revolution
Speech-enabled Internet portals, or voice portals, are the next logical evolutionary step in the convergence of computer telephony and the Internet. Voice portals put all kinds of information at a consumer's fingertips anytime, anywhere. Customers need only dial into the voice portal's toll-free 800 number and are quickly guided to the information they seek. The customer answers with simple voice responses - there's no need to type in menu selections. This makes voice portals easy to use, even from an automobile. The process is no different from browsing the Web on a PC, and anyone can do it from any office, cellular or home phone. Made possible by advances in speech technology, voice portals are changing telephone interaction from a vendor- centric experience to a customer-centric experience - increasing satisfaction for customers while improving efficiency and cutting costs for businesses. Over the next few years, voice portals are poised to profoundly change both the way people use telephones and the way businesses view callers. In today's vendor-centric model, the caller can only exit an organization's interactive voice response system by hanging up. In tomorrow's customer-centric model, the caller will be able to interact with a voice portal, free to jump from one enterprise to another as quickly and easily as surfing the Web. Speech technologies are hot
One of the key enablers for today's burgeoning e-commerce economy is speech technology. Analysts project a continuous growth rate of 31 percent per year from 1999 through 2004 for speech technologies markets. That growth means we are now at the turning point for major deployments of speech-enabled applications. There are several reasons speech has become so hot, hinging on both the technology and how it enables developers to create applications that meet real user needs. The simple speech applications of the early 90s allowed only small-vocabulary (20 to 30 words) command and control and only recognized discrete digits. We are on the verge of sophisticated speech-enabled applications such as personal virtual assistants, stock trading, auto attendants in the enterprise, travel reservation systems and many others. The very near future will see widespread public network deployment of applications like large-vocabulary (one million entries), automated directory assistance, hosting of business applications - and voice portals.
Technology makes it so
The first key to the hot new generation of speech applications is speech technology. The accuracy and vocabulary size of ASR engines has dramatically improved over the last few years. These improvements have been fueled by refined algorithms, dramatic increases in processing power, lower costs and more powerful technologies that make speaker-independent, continuous-speech applications a reality. Barging technology has also been perfected, allowing callers to speak over outbound prompts and still have their utterances recognized. The introduction of natural language capabilities has also had a dramatic effect on the usefulness of speech applications. Service providers expand their role
Besides new and improved technologies, another key factor in the evolution of today's powerful speech-enabled applications has been the changing role of technology providers. Once primarily algorithm providers, today's service providers have expanded to become comprehensive organizations with experts in human factors design, application analysis, system analysis, linguistics and telephony application development. Service providers have also developed extensive professional service and support capabilities to create, deploy and maintain sophisticated speech applications. New testing tools slash development time
Application testing has also improved, with new techniques that make it possible for technology providers to develop applications they can rapidly and consistently deploy. Starting with the basic research design, a developer can rigorously pilot test a new application, then keep feeding the results of the pilot tests back into the development process until the application is ready for deployment. Once the application is deployed, the developer can apply his newfound expertise to other similar applications - making it easier to rapidly deploy new large-scale applications. Technology developers have expended tremendous effort creating powerful tools to ease rapid deployment. One such tool is a dialog application component, a high-level applet that contains much of the knowledge gained from an application's dialog design and implementation of frequently used caller interactions. For instance, a DAC might include applets that allow the user to select an item from a list, or that let the application collect credit card numbers, accept a yes or no answer, take travel origination and destination information or look up a stock quote. By providing this expertise in reusable objects, a DAC can dramatically reduce the time it takes to build a new application. Applications that once took 30 person-years to develop can now be built by a much smaller staff in months or even weeks. Text-to-speech has arrived
Beyond speech recognition, today's improved text-to-speech technology is another essential element in the evolution to speech portals. The latest generation of text-to-speech technology greatly improves the poor quality of voice transmission that prevented widespread acceptance of TTS in the past. Language support has also been greatly expanded. Plus, preprocessors have been developed to handle "dirty" data - essential for using text-to-speech in real-world applications. For example, e-mail preprocessors that correctly deal with factors such as acronyms, contractions and intonation have made it possible to build applications that read subscribers' e-mail messages to them over the telephone. The Internet delivers
The Internet e-commerce revolution has led the public to expect immediate information access and transaction processing. Much work has been done to develop the infrastructure to support these expectations, centered around scripting languages such as HTML. New speech-enabled scripting languages such as VoiceXML are extending Internet capabilities to the telephone. Speech-enabled scripting languages can leverage the entire infrastructure developed to support the Web, extending it to the client/server architecture. The result is powerful speech server platforms that can be controlled by remote Web-based applications. The evolution of speech-enabled IVR
The evolution of speech-enabled interactive voice response has been dramatic. For example, consider an enterprise providing access to callers using an 800 number. In the beginning, a customer would dial an 800 number and reach an operator who provided assistance. But organizations quickly realized that staffing is expensive and automating many of the operator's tasks could save money. An IVR system was the way to provide this automation. The next step was to integrate the IVR system with a database so that it could handle dynamic information. Connecting an enterprise's IVR systems using computer-telephone integration allowed the system to transfer a call to a live operator, who could get a screen pop with personalized caller information, such as the status of open orders. Powerful ASR capabilities made it possible to save even more by front-ending the IVR system with speech. For the caller, speech meant a more natural and pleasant interface. Speech-enabled IVR applications were also more efficient than dual tone multi-frequency, or touchtone, applications for a better return on investment. For example, Figure 1 shows a collection of enterprises (A, B, and C), each with its own 800 number that allows callers to access a speech-enabled IVR system. Enterprise D, which has chosen not to implement IVR, provides the caller with the same functionality but at a higher cost for the enterprise. The Internet changes the picture. Figure 2 shows how an enterprise would provide a presence to people at home using their computers with a Web browser. The enterprise had to add a Web server to link the customer's browser to a Web page using a markup language such as HTML. In this case, Enterprise D, which has not implemented a Web server, is completely invisible to a customer using a PC. The customer is not able to access Enterprise D's offerings over the Web. Figure 3 shows how voice portals change the picture. In this case, the customer calls an 800 number and reaches a voice portal. The caller interacts with a sophisticated ASR system implemented by the voice portal. In the first three enterprises, the caller can access stored information enabled by VoiceXML. User profiles stored at the voice portal facilitate transactions or roaming. By storing things such as credit card numbers, frequent-flier membership numbers, favorite restaurants in a secure format at the voice portal, the enterprise has made it much easier for the caller to complete transactions over the phone. What about Enterprise D? Since it has chosen not to implement a voice-enabled entry point via the Web, it is once again invisible to the caller. The Voice Portal Revolution
With more than one billion phones expected to be in use by 2001, no enterprise can afford to be invisible. The motivation to implement access via a voice portal is very strong. Voice portals are a dramatic new addition to service providers' infrastructure, fundamentally changing the way enterprises view their callers and the way callers interact with the enterprise.
Gene Eagle is the technical marketing manager of the Speech Group for Dialogic, an Intel company.