Is Speech Still in the Cards for iPhone?
Speech tech industry insiders greeted the release of Apple’s iPhone with a cry of anguish because the much-hyped device lacked a voice interface. The dream might not be not over, though.
In early August, VoiceSignal—recently acquired by Nuance Communications— posted two videos on its Web site demonstrating a speech interface that it says works with the iPhone. In the first video, a user touches the Vtunes logo. A prompt asks What artist? The user says My Morning Jacket, and the song "Dancefloors" plays. In the second video, a voice search application allows users to search by industry or business name.
The point of the demo was to show that the iPhone has speech interface potential. "This was a capabilities demonstration and is not commercially available. [It was] just to show that it was possible, that it can be done, and what it might look like if you were to do it," says Stewart Sims, VoiceSignal’s executive vice president of marketing. "A full integration of a voice platform, he adds, would require a close working relationship with Apple."
But perhaps Apple is staking a claim of its own. Past patent applications have some speculating that Apple plans to continue to build on prior research into speech interfaces for general-purpose computing. In 2003, the company filed a patent for a voice menu system, and followed it in 2004 with a patent filing for an audio user interface for computing devices, both of which outlined "a method for providing an audible user interface for a user of a computing device." These plans included text-to-speech, speech search, and voice controls to operate media.
But while all of this is enticing, it’s premature to bank on a voice platform in the next-generation iPhone. And disappointment within the speech tech industry is still keenly felt.
"Apple has a small research group—very small. They haven’t done much in the last couple of years," says Jim Larson, an independent consultant at Larson Tech. "I think if Apple wants a speech interface, they’ll buy it from somebody. They can buy it from Nuance. They can buy it from Google. They can buy it from Yahoo! They’re very good at doing this kind of thing. So I can’t foresee iPhone Version 2 having this sort of thing,"
That the iPhone doesn’t have a voice platform flabbergasts Larson, who calls its release "a big blow to our industry." "It’s established the idea that we don’t need speech recognition on these devices," he says. "The iPhone promises to be very successful without speech recognition, and I view this as leaving speech technology in the dust. We may have lost our opportunity to infiltrate these handheld devices."
One can speculate that Apple neglected a speech platform because it thinks the technology isn’t ready for a large commercial rollout. Larson sees two problems. The first is in speech recognition, especially in outdoor environments where people tend to use their cell phones. The second is in natural language processing. At this point, it’s implausible for a handheld device to parse complex requests.
The solution might be IVR-styled dialogue in which an agent focuses user requests. "But here’s the neat thing," Larson says. "Rather than having people listen to the answer—because people have a hard time hearing—it will be displayed on that little screen. So we don’t have to develop grammars as we did before. It’s an IVR with visual output and voice input."
Thus far, however, this remains a pleasant fantasy, but there is hope that research from Google will add a spark to embedded speech interfaces. Google has already positioned itself as a major factor in speech search, and is now rumored to be in development of a Google Phone, or Gphone. Whispers in the blogosphere of such a device, which would reportedly be loaded with Google software, including Gmail, Google Talk, instant messaging, voice calling, and mapping, have been circulating for about a year, but have started to heat up again in recent weeks.