December 11, 2009
By Leonard Klie Editor, Speech Technology and CRM magazines
Speech Technology News

W3C Tackles Emotion and Multimodality

In addition to its work on VoiceXML 3.0, which it released in working draft form earlier this month, the World Wide Web Consortium (W3C) has been hard at work on a number of other specifications that will have serious implications for the speech technology community.

The first is the first public working draft of Emotion Markup Language (EmotionML) 1.0, which allows for representations of emotions and related states in technological applications. The language is conceived as a plug-in language suitable for use in manual annotation of data, automatic recognition of emotion-related states from user behavior, and generation of emotion-related system behavior.

“That might be useful, for example, in a call center, for representing the output of an emotion detector that would detect an angry or upset customer,” explains Deborah Dahl, chair of the W3C’s Multimodal Interaction Working Group, which released the draft.

Outside the call center, the standard would be valuable for generating emotional output in a text-to-speech system, and in working with avatars so the facial expression and the voice match, according to Dahl.

In programming such technology, however, one of the main problems encountered is that the vocabulary needed for an application often depends on the context of use. Using EmotionML, programmers can build some emotions into pre-established vocabularies and do custom work for the others, Dahl explains.

The Multimodal Interaction Working Group also recently published an updated working draft of the Multimodal Architecture, which creates a framework for all the components of a multimodal application to work together.

Dahl says this draft is significantly different from the previous one, with clarifications to its relationship with the Extensible Multimodal Annotation (EMMA) specification, simplified architecture constituents, a description of HTTP transport of lifecycle events, and the addition of an example of a handwriting recognition modality component.

“This is the W3C's language for handling the communication among the components of a distributed multimodal application. For example, speech recognition and dialogue logic might take place in the cloud, while the GUI display would happen locally on the device,” Dahl explains.

The architecture, she adds, will be open so that as new input methods become available, they could be incorporated easily into the larger application. “You can add modalities. It would be just one more application layer that fits into the architecture,” she says.

W3C Tackles Emotion and Multimodality

Omilia Launches Lexis TTS Model for Contact Centers

Callie Care Collects $500K for Voice AI Development

AI Voice Agents Increase Specialty Care Program Enrollment

Study Proves Assistive Technologies Improve Users' Lives

Symend Launches SymendConverse

Sunoh.ai Enhances Home-Based Primary Care and Operational Efficiency at Bloom Healthcare

Modulate Tops Hugging Face's Transcription Benchmark

LALAL.AI Launches Lynx Voice Cleanup Mode

Voiskey Officially Launches

VoicePing Releases VoicePing 3.0

DeepL Acquires Mixhalo

The Voice Can Sound Right, and the Video Can Still Be Wrong

Deepgram Brings Nova-3 Speech Engine to Snapdragon Devices

Voice-Only Outreach 'Structurally Misses' Gen Z and Millennial Debt Holders, Says Vodex AI CEO

Canary Speech Partners with NeuroLexIQ