The Multimodal Interaction Working Group of the World Wide Web Consortium (W3C) in late May advanced Emotion Markup Language (EmotionML) 1.0 as a formal recommendation, effectively making it the industry standard for representing human emotions within computer applications.
The language, according to the W3C recommendation, enables systems that observe human-to-human interactions, such as customer frustration monitoring in call center applications, to identify and represent the emotional elements so the system can process interactions accordingly.
EmotionML isn't the actual technology for computer systems to recognize and generate emotion, Deborah Dahl, chair of the W3C working group, explains. But, "because it's a standard language for expressing emotion, it will make it much easier for different implementations of emotion technology to work together," she says. "It will make a lot of really cool things possible."
EmotionML is a plug-in language suitable for use in manual annotation of material involving emotions, such as videos or speech recordings, automatic recognition of emotions from user behavior, and the generation of emotion-related system responses. The W3C committee had been working on the standard since 2007, with the first working draft published in 2009.
EmotionML could be applied to expressive speech synthesis, generating synthetic speech to express such emotions as happy, sad, friendly, or apologetic; emotion recognition, such as spotting angry customers in speech dialogue systems; support for people with disabilities, enabling them to identify the emotional context of written or spoken content; and marking up media transcripts and captions to help deaf or hearing-impaired people who cannot hear a soundtrack. The plug-in language could also aid opinion mining/sentiment analysis to automatically track customers' attitudes regarding a product across blogs and other text-based media, fear detection for surveillance purposes, and character design and control for video games and virtual worlds.
Dahl points out that EmotionML can help compare what a person is saying with his facial expressions and gestures, something that would come in handy when analyzing input from focus groups.
As the Web becomes ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions, she adds.
To that end, EmotionML "should stimulate a lot of good applications, such as more expressive text-to-speech, better handling of angry or unhappy callers in the call center, assistive devices for people who have trouble recognizing emotions, more realistic avatars, etc.," Dahl says. "It will be a more and more important standard as emotion recognition and generation technology become available and are integrated into other systems."
The industry has yet to agree on a full list of the emotional states to be represented or the names to be given to them, because the list of emotion-related states that should be distinguished varies depending on the application domain. Basically, the vocabulary needed depends on the context of use. An app for collecting contact center surveys, for example, might want to consider anger, while the speech synthesis portion of an IVR would likely not want to convey that emotion.
EmotionML merely identifies and defines the possible structural elements to be used and allows users to plug in the vocabularies that they consider appropriate for their work.
Dahl says this is something that will more likely be handled by system designers. "Vendors will probably incorporate the types of emotions they want to register right into their speech recognition applications," she states.