The Intelligent System: Thoughts From Day Two of SpeechTek 2013

NEW YORK—"We're in an era where voice recognition has been accepted as an equal partner in mobile interface," according to Vlad Sejnoha, chief technology officer for Nuance Communications.

Speaking from a keynote panel on the second day of SpeechTek 2013, Sejnoha argued that speech has become a powerful modality capable of doing more than just making up for the shortcomings of the small touch pad screens on mobile devices. It is capable now of cutting through "onerous visual hierarchies."

If you want to send an email, for instance, with Siri on an iPhone you can say, "Email John Doe," and a compose window will open in Mail, ready with John Doe's email address already plugged in. If you did that visually via touch pad, by contrast, you'd have to open the app, open a compose window, and begin typing John Doe's name.

Whereas a few years ago many conversations around speech seemed to ask what speech's place would be in an increasingly multimodal mobile, the conversations on the second day of this year's SpeechTek conference seemed to accept as a given a large place for speech.

"Touch didn't supplant speech," says Rob Spectre, director of developer evangelism for Twilio. "It raised the expectations around it."

In a panel discussion on The Changing Role of Speech in a Multichannel World, Spectre said he recently switched from an iPhone to a Droid device primarily because of the strength of Google's predicative and anticipatory mobile behaviors. He gives as an example the Boston Red Sox scores he gets in the morning on his phone. When he turns it on in the morning, he sees yesterday's scores. When he turns it on in the evening on a game night, he gets the time of the next game—all this without ever having asked for it. Google, based on Spectre's search history, reckoned he was a Sox fan and personalized his mobile experience.

Expectations around speech follow a similar trajectory. Mobile users are expecting to be able to ask their devices for increasingly complex things ("Where is there good food around here? Make me a reservation and make sure it's cheap.") and becoming frustrated if they aren't delivered. For many at the conference, the key to delivering on speech is not just accurate recognition of direct commands but being able to leverage cloud data and past behaviors to meet vague and general requests, like the above, with desirable results. In short, the mobile device is being expected to know its users.

Speech is not the only modality contributing to an intelligent system, cautions Mazin Gilbert, assistant vice president of intelligent systems research for AT&T; there are visual ones as well.

"We're investing a lot in contextual dialogue, biometrics, visuals, image recognition," he says. "All of these have to come together [for prediction]." Our phones have to be able to watch us to serve us, he hinted.

Interestingly, a few notes of growing concern over privacy and security issues around so-called intelligent systems were also voiced this year. Whereas in the past many conversations at SpeehTek have focused squarely on advancements in quality of recognition, the technology has advanced sufficiently to begin touching on deeper political, philosophical, and civic issues with the implications that surround its use.

"I think we'll see the evolution of [the] virtual assistant into [a] virtual advisor, but what's its purpose?" asks Dan Miller, senior analyst at Opus Research. "It's purpose is not to be human-like...[but] how are we going to control this? We're all going to want it, but it's also going to creep us out."

It could be that these are just echoes from the recent revelations of mass, warrantless NSA surveillance of American citizens, but it's not without significance that the technology has become powerful enough to beg the questions.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues
Related Articles

On SpeechTek's First Day, Cross-Channel Integration Is King

Sixty-five percent of contact centers are handling traffic over at least six channels.

AVIOS Has High Hopes for Speech

The organization claims the technology has broken through the barrier of broad utility and will lead to a chain reaction of new applications.

New System Brings the Speech-Enabled Home Within Reach

Partnership yields voice and vision controls for some home systems.