February 17, 2009
By Leonard Klie Editor, Speech Technology and CRM magazines
Speech Technology News

W3C Drafts New Multimodal Standard

To ensure that the Web is available to all people on any device, the Multimodal Interaction Working Group of the World Wide Web Consortium (W3C) last week published a new standard to enable interactions beyond the familiar keyboard and mouse.

EMMA, the Extensible MultiModal Annotation specification, promotes the development of rich Web applications that can be adapted to more input modes (such as speech, graphic user interface, keyboard, or mouse, and output modes (such as synthesized speech) at a lower cost. For speech-enabled applications in particular, EMMA enables a richer representation of speech input in the form of annotated speech recognition results, which can then be inputted to a voice platform. It allows developers to separate the logic layer of an application from the interaction layer, making it easier to adapt applications to new scenarios.

"EMMA is going to make multimodal applications much easier to develop," says Deborah Dahl, chair of the W3C's Multimodal Interaction Working Group.

This specification, which is the final, official version of a standard that was first published about a year ago, describes the suggested markup for representing interpretations of user input, such as through speech, keystroke, or pen, together with annotations for confidence scores, timestamps, input medium, etc. It forms part of the proposals for the W3C Multimodal Interaction Framework.

"Loquendo welcomes the publication of EMMA 1.0," says Daniele Sereno, vice president of product engineering at Loquendo, a longstanding, participating member of the W3C Multimodal Interaction and Voice Browser working groups. "It will facilitate the creation of multimodal applications together with more powerful speech applications while encouraging innovation and the advancement of the Web, as well as making businesses more competitive."

That's because "EMMA will make it easier for smaller organizations to create multimodal applications," Dahl states. "They can create one application to the standard rather than having a full application set that addresses all the multiple modalities."

Because some input modalities are more prone to noise or interference than others, EMMA allows developers to account for the ambiguity in user input so that during later stages of processing, it is possible to select from among competing hypotheses and overcome errors.

Most cell phones today can receive both voice and text input. With EMMA, it will be easier to create applications that can take advantage of text, voice, or both.

Applications created using the EMMA standard are also likely to benefit people with disabilities by providing alternate methods for Web interaction and access.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

W3C Drafts New Multimodal Standard

Conversational AI to Reach $41.39 Billion by 2030

Voice Deepfake Fraud Surged 1,300 Percent

ESTsoft Partners with ElevenLabs

Deepgram Launches Voice Agent API