Speech Technology Magazine

 

Beyond XML 2.0

The W3C Voice Browser Working Group reviewed more than 700 requests for change to VoiceXML 1.0. After careful deliberation, many of these were adopted, resulting in VoiceXML 2.0 which became a recommendation in March 2004. The VBWG is now working on two efforts to make VoiceXML even better.
By James A. Larson , Scott McGlashan - Posted Jul 8, 2004
Page1 of 1
Bookmark and Share

The W3C Voice Browser Working Group reviewed more than 700 requests for change to VoiceXML 1.0. After careful deliberation, many of these were adopted, resulting in VoiceXML 2.0 which became a recommendation in March 2004. The VBWG is now working on two efforts to make VoiceXML even better.

VoiceXML 2.1 contains features already implemented by speech platform vendors. These include (a) dynamic reference of grammars and scripts, (b) a element to fetch XML data structures and (c) facilities to record utterances during recognition. VoiceXML 2.1 is completely backward compatible with VoiceXML 2.0: all VoiceXML 2.0 applications will run without modification under VoiceXML 2.1. For more details, see http://www.w3.org/TR/2004/WD-voicexml21-20040323/

"V3" is the code name for the follow-on to VoiceXML 2.0. Requirements for V3 are derived from deferred VoiceXML 2.0 change requests, the SALT 1.0 and XHTML + Voice 1.0 (X+V) contributions to W3C, as well as the Multimodal Interaction and other W3C Working Groups. V3 will support:

  • Modularization – a set of modules with common external interfaces. This enables dialog designers to mix and match voice with other modes of input, including keyboard and pen. For example, a V3 module for speech recognition might be embedded into XHTML, enabling a graphical Web page to accept speech input from the user.
  • Extensibility – to extend the power of dialog management. VoiceXML 2.0 already supports both system-directed and mixed initiative dialogs. V3 will allow for plan-based or rules-based dialog definition by
    enabling the dialog author to define new dialog strategies based on low-level components.
  • Low-Level Control of Media Resources – including speech recognition, speech synthesis and audio replay. Using these resources, application developers will be able to specify their own control structures, enabling a procedural style of dialog specification similar to that used by SALT developers in addition to the declarative programming style of VoiceXML 2.0 enabled by the Forms Interpretation Algorithm.

The VBWG will continue to coordinate with the Multimodal Interaction Working Group to guarantee the compatibility of V3 with the multimodal languages

Page1 of 1