World Wide Web Consortium Issues SSML 1.0 as a W3C Recommendation

The World Wide Web Consortium (W3C) has published the Speech Synthesis Markup Language (SSML)1.0 as a W3C Recommendation. SSML 1.0, a fundamental specification in the W3C Speech Interface Framework. Application designers for mobile phones, personal digital assistants (PDAs), and a host of emerging technologies use SSML 1.0 to achieve both coarse- and fine-grain control of aspects of speech synthesis, including pronunciation, volume, and pitch. Like its companion W3C Recommendations VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS) published by the W3C Voice Browser Working Group, SSML 1.0 is built for integration with other Web technologies and to promote interoperability across different synthesis-capable platforms.

"I am excited about the progress the Voice Browser Working Group has made in providing improved access to services over the telephone through the use of Web technologies," said W3C director Tim Berners-Lee, who will be delivering a keynote address at the SpeechTEK Conference next week. He added, "Companies can now offer Web access to their customers via the telephone as well as from a personal computer."

Aimed at the world's estimated two billion fixed line and mobile phones, W3C's Speech Interface Framework -- a collection of specifications for building voice applications for the Web -- will allow an unprecedented number of people to use any telephone to interact with appropriately designed Web-based services via key pads, spoken commands, listening to pre-recorded speech, synthetic speech and music.

A World Wide Web Consortium (W3C) Recommendation is understood by industry and the Web community at large as a Web standard. Each Recommendation is a stable specification developed by a W3C Working Group and reviewed by the W3C Membership. Recommendations promote interoperability of Web technologies by explicitly conveying the industry consensus formed by the Working Group.

One of the primary challenges to strengthening the voice of the Web that SSML addresses is pronunciation. For example, how do you pronounce "1/2"? The SSML 1.0 specification uses this simple example to illustrate some of the challenges of turning general-purpose text into meaningful synthesized speech. Without additional context, one would not know whether to say "one half" or "January second" or "February first" or "one divided by two." SSML 1.0 constructs help eliminate this sort of ambiguity. The SSML vocabulary allows word-level, phoneme-level, and even waveform-level control of the output to satisfy a wide spectrum of application scenarios and authoring requirements.

Like XHTML, SSML is a markup language based on the widely deployed XML standard. SSML content can stand alone or be included in other XML content in order to improve rendering as synthesized speech. SSML is particularly well suited for use with a VoiceXML wrapper when building an interactive voice response application.

SSML 1.0 is built for Web integration in other ways as well. The Voice Browser Working Group worked closely with other W3C groups to ensure that the design of SSML 1.0 is consistent with principles of accessibility, internationalization, and general Web architecture. One important application of SSML involves "text phones" that may be used by people with some hearing disabilities. The same content can also be output as speech through a common telephone. SSML 1.0 is also consistent with previous work at W3C on describing pronunciation with Cascading Style Sheets (CSS). W3C's CSS Working Group is developing a speech module in CSS3 for rendering XML documents with SSML-based speech engines.

W3C's Voice Browser Working Group's test suite (discussed in the July 2004 SSML implementation report) has helped ensure consistent behavior and quality among the already numerous implementations of SSML 1.0. Vendors that have already implemented SSML 1.0 and that are participating in Working Group include: Aspect Communications, France Telecom, Hewlett-Packard, IBM, Loquendo, Microsoft, MITRE, Nuance Communications, SAP, ScanSoft, Sun Microsystems, VoiceGenie Technologies, Voxeo, and Voxpilot.

The Working Group will now focus its energies on the remainder of the Speech Framework. "After VoiceXML 2.0 and Speech Recognition Grammar Specification (SRGS), SSML is the third language of the W3C Speech Interface Framework to become a full W3C Recommendation," said Jim Larson, manager, advanced human input/output, for Intel and also co-chair of W3C's Voice Browser Working Group. "We are working to complete work on other languages of the W3C Speech Interface Framework, including VoiceXML 2.1, Semantic Interpretation, and the Call Control eXtensible Markup Language (CCXML)."

The Working Group is among the largest and most active in W3C. Its participants include: Aspect Communications, BeVocal, Brooktrout Technology, Canon, Comverse Technology, Convedia, Electronic Data Systems, France Telecom, Genesys Telecommunications Laboratories, HeyAnita, Hitachi, Hewlett-Packard, IBM, Intel, IWA-HWG, Korea Association of Information and Telecommunication, Loquendo, Microsoft,MITRE, Mitsubishi Electric, Motorola, Nokia, Nuance Communications, Openstream, SAP, ScanSoft, Siemens, Sun Microsystems, Syntellect,Tellme Networks, Verascape, Vocalocity, VoiceGenie Technologies, Voxeo, and Voxpilot.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

World Wide Web Consortium Issues SSML 1.0 as a W3C Recommendation

SoundHound Partners with Allina Health

Krisp Unveils AI Accent Conversion for Latin America

German Study Validates Life-Changing Effects of Assistive Technologies

Firstsource and Sanas Partner to Redefine Customer Conversations with AI