The 2012 Speech Luminaries

Each year, we honor individuals who push the boundaries of the speech technology industry, and, in doing so, influence others in a significant way. This year's Luminaries do exactly that.

Two popularized mobile voice-enhanced digital assistance by making the technology fun and cool; one is furthering speech standards for multimodal and emotion detection; another published a book on speech technology and became the latest to spearhead the International Computer Science Institute; and the final Luminary navigated the course for his company's acquisition of six different vendors in just one year. These are impressive accomplishments in the span of only a year. And there's no doubt that the benefits of their efforts will be far-reaching.

Siri(ous) Strategists:

Adam Cheyer, director of engineering, iPhone group, Apple
Tom Gruber, product designer, Apple

Even if you don't own an Apple iPhone 4S, you can't escape Siri's impact. The personal voice assistant makes headlines almost daily. Adam Cheyer and Tom Gruber, who brought Siri to the world, have become stars in the speech technology world.

Cheyer was cofounder and vice president of engineering at Siri, which was acquired by Apple, where he is now director of engineering in the iPhone group. Previously, he was a program director in the Stanford Research Institute's (SRI) Artificial Intelligence Center. Gruber was cofounder, chief technology officer, and vice president of design for Siri. He is now a product designer at Apple.

Siri's roots can be traced to 2003, when SRI and a team of 20 research organizations were awarded funding to develop software named CALO (Cognitive Assistant that Learns and Organizes) to support the military. It was considered the largest artificial intelligence project in history.

In 2007, SRI spun off Siri. The company received funding from major backers, and partnered with Nuance Communications to power its speech recognition. In 2010, Siri launched the public beta of its virtual personal assistant for iPhone 3GS, available in the app store for free. Soon after, Apple acquired Siri, and in late 2011, offered it on the iPhone 4S.

"Siri instantly became the symbol for mobile paradigm shift," Craig Bassin, CEO of EasyAsk, told Speech Technology last year."Actually, more than that, it makes it a complete paradigm shift in the way people interact with computers."

A recent survey by research firm Parks Associates found that more than half of U.S. users of Apple's iPhone 4S are "very satisfied" with the Siri voice-command feature. The survey finds an additional 21 percent are "satisfied."

It's rumored Apple could soon launch a TV with a Siri voice interface. The popularity of the natural-language user interface could determine the extent of Apple's impact on the TV market, note analysts at Parks Associates.

"Siri showed the power of coupling speech recognition and natural language understanding technology to provide a personal assistant experience that feels like a person," says Bill Meisel, president of TMA Associates.

Multimodal Maven:

Deborah Dahl, chair, Multimodal Interaction Working Group, Worldwide Web Consortium

As chair of the Multimodal Interaction Working Group at the Worldwide Web Consortium (W3C), Deborah Dahl has had a hand in drafting a number of standards for human-machine interactions, including VoiceXML 2.0, the Speech Recognition Grammar Specification (SRGS), and Multimodal Interaction Architecture. This past year was certainly a busy one.

In September, she led efforts to author the W3C's recommendations for the Ink Markup Language (InkML), a programming interface and data format for recognizing tracing on a screen as an input modality. In January, the Multimodal Architecture and Interfaces became a W3C candidate recommendation, and a month later, the Extensible Multimodal Annotation (EMMA) 1.1 was released as a draft.

The original version of EMMA was released in 2009 and implemented in part in AT&T's speech mashup, Openstream's Cue-Me platform, and Microsoft's Tellme platform and Office 2010. W3C received considerable feedback from these implementations, and has taken some suggestions for features that would make EMMA easier to use, more powerful, and more convenient into consideration when drafting version 1.1.

EmotionML, a language for representing and recognizing emotions from voice, facial expressions, or other modalities, is also in the W3C Multimodal Interaction Working Group's pipeline this year. EmotionML will provide input to text-to-speech or avatars that express emotion, and EMMA 1.1 promises to make that recognition more precise.

As principal at speech and language consulting firm Conversational Technologies since 2002, Dahl assists clients in applying the state-of-the-art in speech recognition, natural language, spoken dialogue systems, assistive technologies, and Web and multimodal technologies to innovative products.

In her current capacity, she draws on years of experience working with Unisys to commercialize spoken language understanding technology and develop natural language technology, speech grammars, dialogue systems, and text classifications. She was director of core product engineering in Unisys's Natural Language Processing Group, and was a principal investigator on a joint U.S. Department of Defense–Unisys project to blend natural language understanding with speech recognition.

Dahl also serves as a board member of the Applied Voice Input/Output Society (AVIOS) and the W3C's Voice Browser Working Group.

The 2012 Speech Luminaries

EMMA Success Leads to New Challenges

ICSI Partners with Microsoft on Speech Research

ICSI Partners with Microsoft

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

CivAI Launches AI Voice Game to Demonstrate the Future of AI

The Healthcare Industry's Strategic Advantage Is Now Voice AI