The 2011 Speech Luminaries
Posted Jul 1, 2011

We’re all familiar with the personal trainer’s mantra “no pain, no gain.” The idea that growth is possible only when you push yourself beyond your comfort zone applies to our four Luminaries. Knowing that standing still leads to contraction, these leaders have made bold moves to anticipate the direction of the industry. One has completed a dizzying 44 acquisitions in about a decade, with a recent deal fortifying his company’s presence in speech security. Another has pushed user interfaces based on speech, touch, and gesture to satisfy consumers’ appetite for connected devices. A third has spearheaded standards developments to unify disparate efforts. And our fourth winner, bowing to the smartphones craze, has supported customer empowerment by creating a mobile division. Four visionaries, four stories.

Empire Building
Paul Ricci
Chairman and CEO, Nuance Communications

When rumors of an Apple deal to purchase Nuance Communications surfaced in early May, many assumed it would happen because Steve Jobs tends to get what he wants. However, Jobs’ counterpart at Nuance, Chairman and CEO Paul Ricci, who has initiated many acquisitions of his own, is used to getting what he wants, too. Unfortunately for them, they couldn’t reach an agreement and the deal didn’t materialize.

Had the deal gone through, Apple would have acquired a company that has made 44 acquisitions since Ricci joined Nuance in 2000. Most of Nuance’s deals have involved a classic build-it-or-buy-it situation, and in most cases, the buy-it option has won out. (At press time, Nuance had just announced it acquired SVOX, a provider of voice solutions for in-car systems.)

Case in point, just as 2010 was ending, Nuance acquired PerSay, a provider of voice biometrics solutions, for an undisclosed sum. What makes the PerSay deal unique, however, is that it would turn Nuance—a leader in speech recognition and speech synthesis—into a formidable force in an area where it had only a marginal presence: speech security. The acquisition brought to Nuance one of the strongest providers of voice authentication solutions. In addition, the deal expanded Nuance’s customer base in voice biometrics to include some of the largest-known customer-facing authentication applications in the world.

“There is strong growth and interest [in biometrics],” Chuck Buffum, vice president of authentication solutions at Nuance, said at the time of the PerSay deal. “It’s finally emerging. We think we’ve positioned Nuance to have all the right products and solutions around the world now and to maximize and leverage that growth.”

To be sure, Ricci, an economist by training, is not done building his speech empire. He started it in 2001, when he swooped in to buy the troubled Lernout & Hauspie, which went bankrupt after a huge accounting scandal, for $39 million.

As for Apple, its interest in Nuance remains. With Apple set to unveil iOS5, the latest version of its mobile platform, reports suggest Apple would be willing to strike a licensing deal with Nuance to bring voice recognition and synthesis to Apple’s mobile devices. Though Apple likely could develop the technology on its own, it would not be completed in time for the iOS5 release this year. Ricci then just might have Apple where he wants it, rather than the other way around.
—Leonard Klie

(FIRST OF FOUR PAGES)

Connecting, Naturally
Zig Serafin
Gneral Manager of the Speech Group at Microsoft

The growth of connected devices, in everything from cars to mobile phones, is ushering in a broad technology shift toward more integrated, natural experiences driven by speech, touch, and gesture. In the past year, no one has been more prominent in that effort than Zig Serafin, general manager of the Speech Group at Microsoft. His focus on the Tellme platform, which Microsoft acquired in 2007, has led to a larger share of the speech market, new revenue streams, and fundamental changes in how people stay connected.

One product that has transformed how people interact with devices is Kinect for the Xbox 360 videogame console, which Microsoft released in November. Kinect incorporates voice technologies that can be employed to control devices and games and conduct headset-free party chats over Xbox Live. Kinect also allows users to control their televisions through voice, enabling them to call up ESPN, for example, by saying what they would like to watch.

Also during the fall, Microsoft released Windows Phone 7, which integrates speech into the phone for functions such as search, navigation, and dialing. On Windows Phone 7 devices, users of the Bing voice search technology will be able to ask, “Who is pitching for the Giants tonight?” and get a listing of pitchers, as well as ticket and weather information.

Microsoft also expanded the voice search capabilities of Bing, and users have embraced the change. In fact, one in five searches on Bing for Mobile is performed by voice.

Moreover, feeding off its success with the Ford Sync and MyTouch automotive telematics systems, Microsoft and Kia codeveloped the UVO multimedia and infotainment system, which the Korean automaker rolled out in its new Sportage, Sorento, and Optima models late last year. UVO lets users access media content and connect with people through  quick voice commands without having to navigate hierarchical menus.

As user interfaces get further integrated into technology, customers will be able to interact more naturally—in front of the TV, in the car, on the go with their mobile devices, or when interacting with businesses through customer-care applications. If Serafin has anything to say about it (pun intended), Microsoft will continue to be at the center of it all.
—Leonard Klie

(SECOND OF FOUR PAGES)

Steward of Standards
Dan Burnett
Director of Speech Technologies and Standards at Voxeo

Estimates suggest that 85 percent of IVR systems deployed in North America and Western Europe use VoiceXML and Speech Synthesis Markup Language (SSML). A new version of SSML—ushered in by Dan Burnett, director of speech technologies and standards at Voxeo, in September—has exposed even more markets to the language.

SSML 1.1, which Burnett spearheaded as co-chair of the Voice Browser Working Group of the World Wide Web Consortium (W3C), extends speech on the Web to an enormous new market by improving support for Asian languages and multilingual voice applications. SSML 1.1 also provides control over voice selection and such speech characteristics as pronunciation, volume, and pitch. What’s more, TTS control is extended to more parameters. The trimming attribute, for example, enables different extracts of prompts or audio files to be rendered according to context.

“With SSML 1.1, there is an intentional focus on Asian language support, including Chinese languages, Japanese, Thai, Urdu, and others, to provide a wide deployment potential,” Burnett says. “With SSML 1.0, we already had strong traction in North America and Western Europe, so this focus makes SSML 1.1 incredibly strong globally.”

The multilingual 1.1 enhancements stem from discussions Burnett led in China, Greece, and India. He “organized and led an international team of experts to sift through requirements, analyze alternative strategies, and define extensions that enable SSML to represent the various aspects of human language pronunciation,” says Jim Larson, co-chair of the W3C Voice Browser Working Group. “I believe Dan has brought the world a little closer together by helping to create a single computer language that can represent how people speak around the world.”

“The SSML specification is a significant development for application developers and technology integrators working around speech, as it hugely simplifies the creation of speech-based applications on the Web and elsewhere,” Paolo Baggia, director of international standards at Loquendo and coauthor of the standard, said in a statement.

Burnett is no stranger to standards development. During the past nine years, he has led many speech recognition efforts in the W3C. For example, he served as editor of VoiceXML 2.0, 2.1, and 3.0.
—Leonard Klie

(THIRD OF FOUR PAGES)

Bringing Speech to Mobile
Zor Gorelov
Cofounder, Chairman, and CEO at SpeechCycle


Recognizing the growing importance of the smartphone as a consumer tool for contacting customer care, Zor Gorelov, the cofounder, chief executive officer, and chairman of SpeechCycle, launched a Mobile Division in March that will focus exclusively on meeting the growing demand for customer care solutions on the smartphone. His goal for the division was to dedicate significant resources to creating products and solutions that support the operating systems of leading smartphones and other mobile devices.

Mobile devices, he reasoned, offer opportunities for companies to deliver a highly differentiated customer experience that drives customer loyalty, optimizes contact center operations, and increases revenue.

Many believed the launch of the Mobile Division was a natural progression for SpeechCycle, but for Gorelov, it was anything but business as usual. Under his leadership, SpeechCycle has become one of the fastest-growing companies in speech technology. So it’s no surprise that this pioneer in developing robust, phone-based self-service applications processed its one-billionth speech-recognition interaction through its hosted rich phone application (RPA) OnDemand platform earlier this year. SpeechCycle developed the class of customer interaction management solutions known as RPAs.

RPAs—which combine high-definition natural-language understanding with deep integration and customer back-end systems to achieve up to three times the automation rates of traditional phone-based IVR systems—are the foundation on which SpeechCycle has built its business since Gorelov founded the company in 2001 with partners Ruth Brown and Victor Goltsman. The trio previously founded BuzzCompany.com, a developer of enterprise collaboration and messaging solutions that was acquired by Multex (now part of Reuters) in 2000.

Prior to starting Buzz, Gorelov was a senior consultant at Microsoft and held management positions at Computron Software and Information Builders. His interest in speech technology dates to 1992, when he worked at Bell Labs.

However, for SpeechCycle, the key underpinnings of the company’s success revolve around an ability to support customer empowerment—regardless of the channel—which is the foundation of true self-service.
—Leonard Klie


News Editor Leonard Klie can be reached at lklie@infotoday.com.