VoiceXML 2.1 Sets the Standard

While VoiceXML 2.1’s release created quite a buzz following the World Wide Web Consortium’s (W3C) announcement June 19, the updated platform includes just two completely new features. The announcement made an impact, however, because some speculated that version 2.1 was an indication of a soon-to-follow version 3.0. In addition, two of VoiceXML 2.1’s new features, the <data> and <foreach> tags, have generated significant user interest.

VoiceXML 2.1 created new standards by enhancing 2.0’s capabilities and interoperability of voice browsers and speech recognition systems through the extension of the previous version’s dialogue language. Now, commonly implemented features are engrained within the standard’s language, giving users increased ease of use.

The version’s new features include dynamic references to grammars and scripts, detecting when a barge-in occurs during a prompt, and processing multiple sets of data from the server in a single access. While two features are new, the remaining six are enhancements of existing VoiceXML elements.

Jeff Kusnitz, senior software engineer at IBM, says these small changes indicate that VoiceXML 2.1 was not intended to act as a replacement for VoiceXML 2.0, but that the changes were significant enough to warrant documenting and standardizing. Thus, a third incarnation of the standard is expected to follow relatively soon.

“The W3C Voice Browser Working Group has been working on VoiceXML 3.0 in parallel with not only the 2.1 effort, but several other efforts,” Kusnitz says. “Now that VoiceXML 2.1 is a recommendation, I’d expect even more time and energy devoted to the VoiceXML 3.0 effort.”

Deborah Dahl, chair of the W3C’s Multimodal Interaction Working Group, also confirms that version 3.0 is in the pipeline.

One new feature of VoiceXML 2.1, the <data> tag, allows for quicker transactions when accessing the server during customer interactions. Dahl adds that, “Previously, the application would collect information from the user, send it to the server, and then get a completely new page from the server in order to continue the interaction. Although dialogues can certainly be supported this way, it makes the development process more straightforward to continue the dialogue as defined in the same document as it’s now allowed by the <data> tag.”

As a result, when it doesn’t need to constantly generate new pages, the server is freed up to work on other tasks. For businesses, this translates into increased productivity and speedier customer interactions. As customers continue to demand faster, more efficient service, the <data> tag gives businesses a simple means of creating a more convenient user experience.

A second new feature, the <foreach> tag, aids users when reading lists. Dahl explains that the tag allows the VoiceXML interpreter to go through individual items of a list, “even if it didn’t know how long the list would be ahead of time.” For example, an application that lets you buy movie tickets by phone would give the user a list of movies and times.

“The <data> tag, along with the <foreach> tag, give application developers a tremendous amount of power and flexibility, and push VoiceXML one step further down the road to a complete separation of the data and presentation layers,” Kusnitz states.

In addition to VoiceXML, W3C added modifications to its Semantic Interpretation for Speech Recognition (SISR 1.0) application. SISR 1.0 can now extract and translate textual representations of words identified by a speech recognition system, and structure the result in a convenient format for the speech application. For example, if a caller told the system, I want to fly from Los Angeles to Seattle, the result would be translated into departure: LAX, and, destination: SEA.

Of the 437 companies registered with W3C’s VoiceXML Forum, each has the option to undergo a platform certification program. The program, whose pilot launched in June, includes certification for both VoiceXML 2.1 and SISR 1.0. An extended platform certification test suite is under development; W3C has encouraged its forum members to help define and develop the suite. In addition, certification testing for additional languages is expected in the future.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

VoiceXML 2.1 Sets the Standard

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API