Opening Keynote at Voice Search 2009 Looks to Future of Technology
SAN DIEGO--Three speech experts took the stage to discuss the future of voice search technology yesterday, at the opening meeting of the Voice Search 2009 conference.
Johan Schalwyck, a senior staff engineer at Google, Melvin Hunt, co-founder of Novaruis Technologies, and Marsal Gavalda, director of incubation at Nexidia were all on hand to offer their opinions.
Taking a broad view of the industry, Schalwyck asserts, “In speech, trends are incremental.” He says the technology matures steadily over time, building on itself through gradual accretion rather than bursting with sudden breakthrough explosions. He attributes the recent development of voice search applications to speech technology finally reaching a level where its use is economically viable for a number of companies.
The increased market viability of speech has been one of the driving factors of growth, particularly in mobile speech, which has seen a kind of renaissance with the birth of the iPhone.
Attesting to growth of new markets, Jay Wilpon, executive director of speech services research for AT&T, showed an advertisement for a prototype of a mobile search program in a talk on advanced applications. As he cued up the spot, he told the conference that in his decades of experience in speech for AT&T it was the first time he had ever seen an advertisement produced for a speech product.
Traditionally, speech has focused on cost cutting solutions, and, with its implications for labor, has not been advertised to consumers. The mobile market, however, opens new doors and new direct-to-consumer revenue streams. Much of this year’s conference is focused on that and the developments of the mobile speech sector.
But, even as the market grows, Schalwyck still doesn’t see any earthshaking breakthroughs in the foreseeable future of speech. He expects the technology will grow at the same steady pace that it has historically. The technology, he explains, is complex, requires a tremendous amount of thought and planning, and does not get built overnight.
“It’s the nature of the beast,” he says.
Dave Thompson from SpinVox, however, counters that he does see some quickening in development as the market expands. He expects that the infusion of capital from sales will help speed along developments.
The panel also sees technological challenges on the horizon for voice search.
According to Hunt, one of the biggest challenges facing the future of voice search is non-native speakers and the creation of a system that can be inclusive, fully accounting for the idiosyncrasies of differing pronunciations. The problem, as he sees it, is two-fold in places like the United States where it is potentially offensive to suggest that non-native accents pose a problem to a speech system.
“It is deeply insulting to Americans to suggest that their accent cannot be recognized,” Hunt says, intimating that this is tantamount to rejecting a speaker’s very American-ness to suggest it.
Gavalda claims that Nexidia has begun to meet these challenges by including non-native speakers in its sample of utterances and building various responses into its data. Hunt, for his part, is skeptical, though.
“I don’t believe that simply training on diversified accents does the trick,” he says. “You have to know what the speaker’s background is, where they’re from.” Only with that level of understanding, he suggests, can recognition work on a fundamental level.