The 2014 Speech Luminaries

enable interoperability between systems processing the data. The standard can be applied to manual annotation of data, automatic recognition of emotion-related states from user behavior, and generation of emotion-related system behavior. This could yield more expressive and human-sounding speech synthesis for interactive voice response systems, animated characters, robots, avatars, and personal assistants, but will also play a role in opinion mining and sentiment analysis, among other uses.

EmotionML can be used in conjunction with other markup languages such as EMMA, an extensible multimodal annotation language, and SSML, a speech synthesis language, both of which Dahl had a hand in drafting.

As a working group chair at the W3C, Dahl has also worked on VoiceXML2.0, the Speech Recognition Grammar Specification, the Multimodal Interaction Architecture, and Ink Markup Language (InkML). She serves on the W3C's HTML5 Working Group and the Web Accessibility Initiatives, Protocols, and Formats Working Group, and is on the board of directors of the Applied Voice Input/Output Society (AVIOS).

Dahl was named a 2012 Speech Technology Luminary for her work on EMMA, InkML, and the Multimodal Architecture. James Larson, vice president of Larson Technical Services and cochair of the W3C Voice Browser Working Group, credits Dahl with leading the W3C Multimodal Working Group and its development of a collection of standards like these that, he says, "will form the basis for helping users interact with all kinds of devices in the Internet of Things."

All told, Dahl has more than 25 years of experience in the speech and natural language processing industry, as a research scientist and engineer at Unisys and then at NewInteractions. She has been principal at Conversational Technologies, a firm that provides reports, analysis, training, software, and design services in speech recognition and natural language processing, since 2002.

The Visionary Validator
Todd Mozer, Founder, Sensory

Todd Mozer is the founder of Sensory, a 20-year-old company that provides solutions for speech synthesis, speech recognition, speaker verification, vision, and other technologies. The company licenses speech and vision solutions to consumer electronics manufacturers that are embedded in products such as cars, mobile devices, home appliances, wearables, and toys.

The company has long been in the mobile handset platform field, with its "always listening" touchless control that wakes up devices only when needed, enabling them to "sleep" and conserve power when not in use. To date, Sensory's technology has been shipped in over half a billion units of consumer products, and the company has OEM partnerships that include Samsung and Motorola for not only mobile devices, but wearables as well.

"We've been focused on the consumer electronic market from the start, and for the first ten years of our existence, we were sort of a lone wolf, with [not] a lot of companies putting attention in that area," Mozer says. "Everybody was focused on PCs and telecom, and we were talking about making toys and clocks and home appliances, anything that you could talk to and control by voice."

A new direction for Sensory is in the vision field, which was one of the original ideas behind the company, Mozer says. Sensory is using machine learning technologies and neural networks to make it easier to communicate with consumer electronics. "[This will work] the same way humans communicate with each other, using our eyes and our sensory functions as well," he says.

Sensory's vision technology can be found in its June launch of TrulySecure, an on-device biometrics solution that combines speaker verification with the company's facial recognition engine. The technology presents an alternative to fingerprint and PIN authentication for mobile devices.

As far as the future, Mozer says the company will likely be bringing more sensory functions into the user experience over time. "Right now we're doing vision and speech, and there are a whole lot of new opportunities," he says. "That will probably keep us busy for a while, but I wouldn't be surprised if we started using the other sensors that are on devices like mobile phones to bring in more information and figure out more things."

The 2014 Speech Luminaries

HTML5 Is Live

Media Standards for the Web: WebRTC and WebAudio

On SpeechTEK's Third Day, Hollywood Challenges the Industry

EmotionML Advances as a Standard

The 2014 Implementation Awards

2014 Speech Industry Awards

The 2014 Star Performers

Are We Underestimating the Role of Emotion?

Deepdub Partners with Wonderful

Ramco Introduces Chia Conversational AI Agent

DeepL Launches on AWS Marketplace

Ubie Partners with Mayo Clinic on a Voice-Enabled Healthcare Digital Front Door