Start the Revolution Without Us

Article Featured Image

I attended the recent eComm Europe 2009 show in Amsterdam. Some of the innovations presented were free, community-based, citywide wi-fi networks using home-built antennas, experiments in the reincarnation of the phone booth, and Layar, which combines the real-time view from your phone’s camera with your GPS coordinates to provide a visual overlay of geographic information.

However, I didn’t see a single application that used speech technology, per se, and even telephony was largely absent. RebelVox’s re-engineering of the phone call is almost ready for alpha release, Voxeo explained its new mashup-ready technical/business model, Skype discussed its SILK royalty-free wideband audio codec, and a smattering of telephony in some other applications could be seen. Having heard most of the presentations about current and just-around-the-corner technologies, the consensus seemed to be that the next stage of the communications revolution is in progress—and it’s in text-only mode. Speech technology, and even speech applications, need not apply.

Harsh, I know, but let’s look at social networking, the latest group of communication methods to emerge from the Internet. Individuals, businesses, and even politicians now use Facebook and Twitter to communicate with friends, customers, and constituents. Although I wonder how long this interest will last, I don’t doubt Internet-based social networking tools will remain with us for decades, even after the current tools evolve into something new. So here we are with a full-blown emerging industry—a new semipermanent fixture in our culture—and I challenge you to think of a single social networking tool that centers around speech, or even uses speech as an essential component. Skype, iChat, and other instant messaging services don’t count because they offer communications, not social networking. The core of the revolution, from LiveJournal to LinkedIn to Twitter, remains text-only, with a few video and picture-sharing sites thrown in. We’re even starting to see mobile phones designed entirely around social media, and phone-sized mobile devices that completely lack telephony services.

This isn’t to say there’s absolutely no speech involved in the current revolution. Phweet (an amalgamation of “phone” and “tweet”) demonstrated phone call integration into Twitter, and PhoneFromHere showed off its “gadget” for Google Wave to integrate Skype voice calls into a Wave complete with annotatable recordings. 

Participants asked me, as they always do, when speech recognition and text-to-speech will be available for instant messaging, blogging, and the like; I find that users hunger for these interfaces, though I wonder how many are prepared to actually pay for them. But regardless, in all of these cases the central utility of social networking revolves around text, not speech, and speech is simply an afterthought to provide hands-free/eyes-free access.

Our industry’s free ride is over. We can no longer ride the popularity of speech as the primary interface to services.  Our uphill slog to persuade companies to incorporate speech into products and services will become even harder. Certainly we’re the very highest of high tech. We continue to enjoy the benefits of Moore’s Law as our relative footprint inside any computer-driven device continues to shrink, which means increased accuracy and the ability to place speech into more devices. But speech, in general, and telephony, in particular, missed out on the latest Internet revolution, and now we must compete with more enticing and exciting interfaces. 

And let’s not forget the most important issue of all: money. An instant-messaging robot can answer your queries without the need to decipher the acoustics of an utterance, then generate a decent audio text-to-speech response, and, above all, not pay the absurd premium for data bits marked “telephony” instead of “text.” Companies have an enormous incentive to move customers away from speech and into text.

I remain  optimistic despite these challenges from text. Remember it’s early in the communications revolution, and we have a decent example of what to expect. Nearly 570 years ago, Johannes Gutenberg’s movable type printing press overturned all previous processes and business models. This disturbance did not resolve in Internet time; instead, it led to more than 100 years of radical innovations and new business models. As my friend Neil Rest also points out, the West emerged with an entirely new business infrastructure, new religions (Protestantism), and new political philosophies. 

As for today, speech remains in good shape: Moore’s Law is still alive, network quality and speeds continue to rise, and microphones and speakers continue to be ubiquitous. In our bread-and-butter business, companies still need to provide speech-based interfaces to their services, and people still need to dictate letters and email. We might have missed out on the latest Internet party, but there’s still plenty of champagne left in the bottle.

Moshe Yudkowsky, Ph.D., is president of Disaggregate Consulting and author of  The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at speech@pobox.com.

SpeechTek Covers
for qualified subscribers
Subscribe Now Current Issue Past Issues