February 6, 2009
By Moshe Yudkowsky President - Disaggregate
Industry View

Thoughts for the Men in Black

Here’s a crime-fighting idea that I’ve proposed many times: Every six months, everyone in the United States should report to a local police station for a lie detector test. My proposal raises two very basic questions: First, would this program undermine our democratic institutions? Second, would it even work?

These questions also haunt the speech technology industry. If you supply speech recognition or related technology, you’ve probably had a friendly visit from the Men in Black—government employees wanting to know if your technology can sift through a huge collection of recordings to find discussions about certain topics. The Men in Black want to data-mine audio intercepts to find terrorists and criminals, and they wouldn’t mind picking up some military and political secrets on the side.

We have less time than ever to consider the ethics. Data mining at the local level could be coming to the U.S. very soon. An article in The Wall Street Journal—and I couldn’t get the New York Police Department to confirm or deny—asserts that the NYPD wants to monitor telephones in “high risk” areas to deter attacks. As the lessons of the late November terrorist attacks in Mumbai, India, sink in, we can expect pressure from governments and citizens to increase surveillance.

Keeping in mind the NYPD program, let’s imagine that Mumbai had taken a page from the airline industry and installed a giant black box, and that this black box records all phone calls, text messages, and Internet traffic throughout the city. The question then becomes: Would black-box recordings improve post-incident analysis?

There’s little question such information would be invaluable to the authorities in Mumbai. As they attempt to piece together the movements of the attackers, the effectiveness of the authorities’ response, and the broader tactical picture of the terrorist threat, they have eyewitness accounts and audio and video evidence, but do they have recordings of the mobile phone conversations between the terrorists? Do they have copies of photographs and comments that people sent each other via their phones and uploaded to social networking Web sites?

Let’s reformulate the question yet again: If an attack is in progress, could the black box and data mining provide real-time help to authorities? Could such information help prevent terrorist attacks? Those are lengthy discussions that we’ll save for another day. Instead, let’s turn to the biggest problem of all: How effectively can we mine speech?

Not surprisingly, this is a question that can’t be answered—at least not by me. The Men in Black know the answer, but they’re not talking. The companies that sell audio mining make no concrete promises about accuracy.

By the Numbers

Regardless, I’d like to offer a few guesses because I hope someone will provide better numbers, and because I want to rebut, in advance, the “big, scary number” criticism.

What are the big scary numbers? Let’s assume that of the 19 million people in the greater New York area, the NYPD has roughly 10 million phone calls per day to monitor—a nice, big, scary number. If audio mining generated false positives only 1 percent of the time, the NYPD would still have 100,000 false positives—another big, scary number. Working eight-hour shifts screening one suspicious call every minute would require at least 1,600 people each shift around the clock. So the NYPD program is too expensive, right?

Not necessarily. The NYPD will monitor only calls in “high risk” areas. So let’s assume, then, that the department concentrates on only 100,000 calls, which generate just 1,000 false positives per day—almost a trivial number.

But we can do better if we assume that the Men in Black are at least as smart as television networks, which engage in the relentless pursuit of trivia and instantly discover statistics in real time during football games. Don’t you think the Men in Black are at least that clever? The Men in Black have monitored audio since before World War II. With experience comes wisdom and statistics. A few filters in the data flow—telling the system to always ignore this telephone number or always listen to that one—along with some algorithms will rapidly reduce false positives to manageable proportions.

So there’s really no escaping our ethical problem from an efficacy standpoint. We (probably) have effective technology to monitor all calls in New York’s “high risk” areas. We now return to the unanswered basic question: Would this program undermine our democratic institutions?

Moshe Yudkowsky, Ph.D., is president of Disaggregate Consulting and author of The Pebble and the Avalanche: How Taking Things Apart Creates Revolutions. He can be reached at speech@pobox.com.

Thoughts for the Men in Black

DentScribe Launches DentScribe Perio Charting 3.0

Krisp Launches Voice Translation v3

Treble Technologies and Hugging Face Benchmark ASR Models

Why Better Client Tracking Starts With Better Capture of Spoken Clinical Interactions