September 10, 2010
By Leonard Klie Editor, Speech Technology and CRM magazines
Speech Technology News

SSML 1.1 Becomes a W3C Recommendation

The World Wide Web Consortium (W3C) this week extended speech on the Web to an enormous new market by improving support for Asian languages and multilingual voice applications. The Speech Synthesis Markup Language (SSML 1.1) Recommendation, released September 7, provides control over voice selection as well as speech characteristics such as pronunciation, volume, and pitch.

SSML is part of W3C's Speech Interface Framework for building voice applications, which also includes the widely deployed VoiceXML and the Pronunciation Lexicon (for providing speech engines guidance on proper pronunciation).

"With SSML 1.1 there is an intentional focus on Asian language support," said Dan Burnett, co-chair of the Voice Browser Working Group, director of speech technologies and standards at Voxeo, and co-author of the standard, "including Chinese languages, Japanese, Thai, Urdu, and others, to provide a wide deployment potential. With SSML 1.0 we already had strong traction in North America and western Europe, so this focus makes SSML 1.1 incredibly strong globally. We are really pleased to have many collaborators in China, in particular, focusing on SSML improvements and iterations."

The multilingal enhancements in this version of SSML result from discussions at W3C Workshops held in China, Greece, and India. SSML 1.1 also provides application designers greater control over voice selection and handling of content in unexpected languages.

Estimates suggest that around 85 percent of voice response (IVR) systems deployed in North America and Western Europe use VoiceXML and SSML. The new version of SSML will open significant new markets, thanks to the improved support for non-Western European languages. A number of North American and European vendors of text-to-speech (TTS) products have indicated they expect to support SSML 1.1 within the coming year.

SSML 1.1 builds on this body of work by extending TTS control to more parameters. The trimming attribute, for example, enables different extracts of prompts or audio files to be rendered, according to context; the language attribute allows any voice to speak any language; and lexicon activation/deactivation facilitates use of multiple, conflicting lexicons according to the different contexts.

"SSML is an important part of the overall ecosystem of W3C standards enabling speech across a variety of applications," says Burnett. "SSML in particular provides a key way to render richer, more natural-sounding speech. We are particularly pleased that SSML 1.1 provides advancements in several key areas, including support for Asian and Eastern European languages as well as improved audio controls for authors."

"The SSML specification is a significant development for application developers and technology integrators working around speech, as it hugely simplifies the creation of speech-based applications on the Web and elsewhere," adds Paolo Baggia, director of international standards at Loquendo and co-author of the standard, in a statement.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

SSML 1.1 Becomes a W3C Recommendation

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

CivAI Launches AI Voice Game to Demonstrate the Future of AI

The Healthcare Industry's Strategic Advantage Is Now Voice AI