Developing with VoiceXML: Language and Platforms
VoiceXML technology was developed to allow relatively unskilled individuals to quickly build speech recognition enabled applications. Great emphasis was placed on making the VoiceXML language easy to learn and use. However the simplicity in building applications comes at a price when executing them. The design and deployment of VoiceXML gateway systems is complex and - at least so far - expensive. The VoiceXML gateway stands in the middle between the caller and the content provider. As mentioned earlier, any Web server can be the source of VoiceXML scripts. However, the job of fetching and interpreting the scripts, managing telephony services, operating speech recognition engines, text-to-speech generation, audio and text caching, and functioning as an HTTP client falls to the VoiceXML gateway platform. As this article is written, commercial-grade VoiceXML gateways are being introduced into the market. Some are "turnkey" systems consisting of integrated hardware and software. Others are primarily software and are intended to run on hardware platforms built and operated by the buyer. What about cost? Only a handful of VoiceXML gateways have been purchased at this time and so they are almost custom-built. Nevertheless, total costs for hardware and software together seem to be settling around $8,000 per port for small (4-port) systems and $2,000-$3,000 per port for larger systems (48, 96 or more ports). Cost is not the only barrier to entry. In fact some aspects of operating a VoiceXML gateway (primarily the telephony services and speech recognition engine) are so complex and require such a high level of skill that only the most technically savvy organizations can reasonably expect to field them in the near future. Even developing a VoiceXML application is not a trivial task. In addition to VoiceXML coding, a good developer needs to understand how speech recognition engines work, how to design a good speech UI, and how to write speech recognition grammars. HOSTING SERVICES AND VOICE PORTALS
So if operating a VoiceXML gateway or building an application is a technically difficult and expensive undertaking, are there any alternatives? Yes. Companies wanting to operate speech-recognition enabled call centers or other information services can leave the technical details to host services, sometimes called voice service providers. VSPs deploy and maintain the necessary hardware and software, usually at the same data centers used to house large Web sites. Customers pay for this service, either on per-minute or per-transaction basis, or by leasing port capacity in blocks. Calls come in to a number specified by the customer and are answered with a customer-specific script. The fact that the application is hosted by a VSP is invisible to the caller. Major VSPs include SpeechHost and Voci. Branded voice portals such as TellMe or BeVocal also host applications for third parties. Voice portals differ from VSPs in that all calls come in to a single number and are answered with the brand name of the voice portal. Callers have access to different types of content (typically weather, news, etc.) and can navigate their way to hosted applications. VPs may charge in a manner similar to VSPs or may insert their advertising into hosted content. The capabilities of individual VSPs and portals vary widely. Some offer a variety of services including VoiceXML and conventional speech recognition apps, IVR hosting, content, speaker identification, WAP and professional services to build, maintain and integrate a speech application. Others have more limited offerings, are dependent on technology from a particular vendor or host only applications they develop. Beware of VSPs or VPs who claim to offer voice access to an existing Web site without substantial revisions. Existing Web sites are rarely constructed in a way to permit an effective voice interface. VOICEXML AND THE OPEN SOURCE MOVEMENT
So what does the future hold for VoiceXML technology? VoiceXML may provide the magic key that finally brings speech recognition to the attention of the general public. Or the complexity and expense of operating gateways combined with failure to achieve a standard (see sidebar) may doom this technology to being the speech equivalent of 8-track tapes. One promising development is movement toward an open source model. A major step in this direction was taken recently when Speechworks, a major ASR vendor, and Carnegie-Mellon University, a leader in ASR research, announced the Open Source Speech Initiative. This project Open Source Speech Initiative
will include the release of an open source VoiceXML interpreter, a major component of a gateway system. The opportunity for the public at large to experiment with VoiceXML technology can only promote its acceptance. If other open source efforts (such as Apache and Linux) are an indication, the result will be better, cheaper and more standardized software than before.
Steve Ihnen is the vice president, applications development for, SpeechHost Inc.