VoiceXML 2.0: A Real Standard for a Real World

VoiceXML 2.0 is more than just a standard.  It is a growing infrastructure and community of speech application developers.  VoiceXML has become the following:

1.  Open specifications that meet the needs of the speech development community

Three languages have reached the full recommendation status: VoiceXML 2.0, Speech Synthesis markup Language (SSML) and Speech Recognition Grammar specification (SRGS).   Additional languages are advancing to become full recommendations:

  • VoiceXML 2.1 — a collection of eight new features.  All VoiceXML 2.0 applications will work without change under VoiceXML 2.1
  • Semantic Interpretation — a JavaScript-like language used to extract and translate words from the speech recognition engine into semantic concepts
  • CCXML — an event-based language used to manage telephone connections 
  • Pronunciation Lexicon Specification (PLS) — language for specifying how words are pronounced
  • VoiceXML 3.0 — a new dialog language which will contain several new features and extensions to VoiceXML 2.0/2.1

2.  The most widely used speech application development language

  • The adoption of VoiceXML 2.0 has been phenomenal.  Ken Rehor's "The World of VoiceXML" Web site  reports over 46 VoiceXML platforms, 21 voice service providers, and four free openware implementations of VoiceXML.  These implementations support hundreds of VoiceXML applications that interact with thousands of callers every hour of the day.  The number of VoiceXML implementations is expected to quadruple this year. 

3.  Portable across certified platforms

The VoiceXML Forum has created a series of tests  to validate if a VoiceXML implementation conforms to the official W3C recommendations. Your VoiceXML application is portable among platforms from vendors that have completed the VoiceXML platform certification test. 

4.  Community of skilled developers

The VoiceXML Forum is actively engaged in enlarging and improving the community of skilled VoiceXML developers.  Its programs include:

  • The VoiceXML Review  — bi-monthly e-zine containing articles, programming hints, suggestions, and other topics all related to VoiceXML 
  • VoiceXML online tutorials  — useful explanations for how to develop VoiceXML applications
  • Developer training and certification  — the VoiceXML developer certification test identifies and rewards developers who are very skilled in VoiceXML 
  • Webinars  — specialized VoiceXML topics presented by experts
  • Education exchange  — a Web site listing of resources available to instructors of university and college courses involving VoiceXML training and research
  • Community message board  — an electronic bulletin board where developers can discuss problems and trade ideas

5.  Bases for new, higher-level standards 

In addition to tools that simplify the development of VoiceXML applications, additional tools are needed for configuring, testing, and monitoring VoiceXML applications so developers can identify and correct trouble spots to improve the user experience.  Efforts for creating environments above the VoiceXML level include the following:

  • Extensible Human-Machine Interface  (xHMI) — an open, XML-based configuration and control language from ScanSoft that enables compatibility among speech applications, tools, application frameworks, and components.  xHMI enables interoperability between application components from different vendors.
  • Reusable Dialog Components  (RDC) — a collection of components from IBM that contain a data model, speech-specific assets like grammar and prompts, configuration files, and the dialog logic needed to collect a piece of information. The VoiceXML that performs the VUI is generated by the component implementation. A developer writes an application by integrating these components and specifying their run-time behaviors through component attributes and configuration files.
  • X+V — IBM, Motorola, and Opera have contributed XHTML plus Voice   to the W3C.  X+V enables developers to insert pieces of VoiceXML code directly into HTML applications, enabling multimodal applications into which the user may speak and/or type. Both IBM  and Opera  have implementations.


Each of the W3C languages undergoes a rigorous process to become a W3C recommendation. This results in a set of well-defined, open languages that can be used together to develop speech applications.  These languages have been widely implemented with a platform conformance program to improve portability, as well as programs to increase the skill level of the community of VoiceXML developers.  Just as the Internet is ingrained to the way we access information, VoiceXML is ingrained in the way we develop speech applications.

Dr. James A. Larson is manager, advanced human input/output, Intel, and author of the home study course VoiceXMLGuide, http://www.vxmlguide.com.  He can be reached at jim@larson-tech.com.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues