July 15, 2009
By Moshe Yudkowsky President - Disaggregate
Industry View

Thinking Backward and Thinking Ahead

The recent Star Trek movie made me realize just how long it has been since I saw an article in the mainstream press that explained speech technology by referencing that particular bit of popular culture. Maybe that’s because people now experience speech technology in their everyday lives: From phone calls to GPS devices, they no longer need a fictional example to understand the basic idea.

That said, we can still gain interesting insight into our technology from science fiction, but it’s probably not the insight you’re thinking of.

The cute android “Data,” who appears in Star Trek: The Next Generation, shares something in common with the terrifying robots of The Terminator movies: text-to-speech (TTS) with perfect prosody and a complete lack of effect. You’d think that when TTS works well enough that each robot can have a unique and entirely human-sounding voice, the programmers would be able to add at least a little effect. Most popular science fiction suffers from this problem—bits and pieces of progress, but no coherent, fully realized technology.

I’ll let you in on a little secret: Serious sci-fi buffs wince at Star Trek because it never thinks things through, and thinking things through is the hallmark of great sci-fi. Take Star Trek’s transporters, which “beam” people from place to place. Star Trek borrowed these devices from other sci-fi works, and we who enjoy the genre understand quite thoroughly that small differences in how the technology actually works will change society in profound ways. With transporter technology, any home can be burglarized or any person kidnapped at any time, not to mention that transporters make both assassins and intercontinental ballistic missiles redundant. Star Trek ignores the societal implications of its basic premises: inexpensive space flight, nonhuman citizens, transporters, phasers, and, of course, perfect speech technology.

If you want a rare bit of brilliant, popular sci-fi for television created by someone who understands the societal implications of technology, as well as ordinary human behavior, then try to get your hands on Babylon 5.

Everyday Changes

We all know how perfect speech recognition will affect everyday life: We’ll be able to talk to our thermostats, refrigerators, and light switches. But a change in society regarding how people interact is a different matter entirely. No sci-fi author I’m aware of ever thought of something as odd as Twitter. How the integration of speech recognition with this and future Internet technologies will affect society has yet to be decently imagined, let alone fully understood. But be warned: Understanding does not bring wisdom. The dark side of speech recognition—the threat of universal surveillance—is an old sci-fi theme that is already partially upon us. Governments will attempt to track every word we say in any public place or over any telecommunications network. And, very clearly, sci-fi has not armed the populace with sufficient spunk. A network of video cameras in the United Kingdom spies on the populace with an efficiency the Gestapo could only envy in despair.

Often overlooked, but fascinating to think about, are the implications for society if TTS becomes truly excellent. Some writers postulate that children’s toys—the teddy bears and small dolls that children enjoy—will educate children from the earliest age (and, perhaps, will thus be regulated as educational tools by the government). Will children’s reading skills improve dramatically as their toys read books to them for as long as they like and with infinite patience?

Conversely, some authors imagine that TTS will make literacy irrelevant. If today’s schools fail to teach students to read, then TTS and speech recognition will create an underclass that cannot read or write at all. Author Larry Niven, a master at the art of thinking things through, proposes that in a multiplanetary, multilanguage culture, even starship controls will use only icons, speech recognition, and TTS. Consequently, even highly educated people will never learn to read. I worry that TTS, speech recognition, and real-time translation could fragment U.S. society as immigrants fail to learn—and have no incentive to learn—English.

I don’t want to sound pessimistic. Sci-fi, like other fiction, tends to focus on the negative and the dramatic. Real life is very different. As author Pat Cadigan says, “The difference between real life and fiction is that fiction has to make sense.”

The most useful lessons I have learned from sci-fi relate mostly to people, not things. While we often focus on how to add speech technology to one product or another, perhaps we could profit more by asking how speech technology could change people, society, and human interactions.

Moshe Yudkowsky, Ph.D., is president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at speech@pobox.com.

Free

for qualified subscribers

Subscribe Now Current Issue Past Issues

Thinking Backward and Thinking Ahead

SoundHound Partners with Acrelec

Deepfake AI Market to Generate $41.36 Billion by 2032

SoundHound Launches Vision AI

Vuzix Introduces LX1 Smart Glasses for Warehouses