I was very indignant when I started to write this article. Speech has always been, and for the most part still remains, a second-class interface. It takes a back seat to keyboards, mice, and displays. Computers have speech tacked on as an afterthought; the same applies to cell phones. Perhaps a few PDAs have speech interfaces, but I’ve never seen one in use by an actual customer.
Designers ignore speech. When was the last time you saw a product designed specifically for a speech interface? A few specialized ones are out there, but they’re niche products for specialized segments of the population, like the elderly, the visually impaired, or warehouse workers. I challenge you to find a speech-centric product that is in the hands of even 1 percent of the U.S. population.
But my complaint goes deeper than this. In trying to come up with an example of an application created for the specific use of speech interfaces—not just a product with a speech interface, but examples of applications that aren’t possible, even in theory, without a speech interface—I ran into a brick wall. Perhaps there aren’t any? Perhaps speech deserves to be a second-class interface?
A small experiment comparing speech to the keyboard convinced me that life isn’t so bleak. Take your briefcase or purse, empty it onto your desk, and look at all the devices you use every day. When I do this I see the highest of high tech: cell phones, PDAs, calculators, and computers big and small. The high-tech gadgets may be expensive and exquisitely designed, but the applications for them and the tasks they accomplish are, for the most part, not just old but ancient. Take the word processor on your computer. During the past 100 years, writing has evolved from pen and ink to typewriter to keyboard, but the fundamentals of writing have remained unchanged since the days of painting in caves. The word processor makes it easy to add words and delete them, to distract yourself with pretty fonts and elaborate tables, but most word processors force you to relentlessly write one word after another—no different than using a quill and parchment. The mechanics of writing have changed, but the art of writing has not improved.
Other gear is much the same. The PDA replaced the desk calendar. Spreadsheets replaced chalkboards. Each generation of high-tech gear operates faster, costs less, and reduces drudgery; certainly everyone would rather carry a flash drive than the collection of clay tablets that our ancestors used to write on. But what really happens in a modern-day office that a visitor from the Middle Ages would not fundamentally understand?
A Few Takeaways
There are several lessons here. The first is that speech isn’t suffering from a unique invention deficit. Every inventor faces a severe challenge when trying to create entirely new categories of useful human activity. The second is that even though word processors, email, spreadsheets, and databases are updates of ancient activities, they are still wildly successful. Another is that there is no single path to success. The typewriter replaced the pen but not paper, and that succession came in evolutionary steps. The spreadsheet replaced the dusty chalkboard entirely, and that succession came through abrupt change. Speech interfaces can take either of these paths or carve out a new one.
And because almost all of today’s keyboard-based applications have their roots in activities as old as human history, perhaps there’s a hint of a final lesson: We need to rethink why we create speech applications. Can we build something entirely new? Can we create an application or device to accomplish something that was previously impossible? In the realm of word processing, for example, speech applications center around the idea of replacing keyboards, menus, and mice. What about a speech interface to replace pen and ink? What about a speech interface that replaces a stack of story ideas scribbled on index cards? Can a speech interface encourage and help an author write clearly and convincingly instead of just throwing words up on a screen?
Much of the world’s treasured and important literature revolves around spoken words: the dialogues of Socrates, the plays of Shakespeare, and the speeches of Winston Churchill. How can speech technology help infuse everyday writing with this speech-based magic? Professional writers can tell you about how they found their voice. I wonder if speech technology will let the art of writing find its own voice.
Moshe Yudkowsky, Ph.D., is president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at firstname.lastname@example.org.