Great article by Bruce Schneier title How To Not Catch Terrorist.
“Data mining for terrorists: It’s an idea that just won’t die. But it won’t find any terrorists, it puts us at greater risk of crimes like identity theft, and it gives the police far too much power in a free society.
The first massive government program to collect dossiers on every American for data mining purposes was called Total Information Awareness. The public found the idea so abhorrent, and objected so forcefully, that Congress killed funding for the program in September 2003. But data mining is like a hydra–chop one head off, two more grow in its place. In May 2004, the General Accounting Office published a report that listed 122 different federal government data mining programs that used people’s personal information. That didn’t include classified military programs like Tangram, or state-run programs like MATRIX.
Now TIA is back with yet another name: Analysis, Dissemination, Visualization, Insight and Semantic Enhancement, or ADVISE. “It’s an experiment to see how you can better analyze data that you already have, that you’ve already legally collected, to see if you can understand it, sort it and make use of it more readily than simply doing it manually,” Homeland Security Chief Michael Chertoff told the Associated Press this month.
The names change, but the basic idea remains the same: suck up as much data as possible about everyone, sift through it with massive computers, and investigate patterns that might indicate terrorist plots. It’s a compelling idea, but it’s wrong. We’re not going to find terrorist plots through data mining, and we’re going to waste valuable resources chasing down false alarms.
Used properly, data mining is a great tool. As a result of data mining, AT&T reduces the costs of cell phone fraud, Amazon.com shows me books I might want to buy, and Google shows me advertising I’m more likely to be interested in. But it only works when there’s (1) a reasonable percentage of attacks per year, (2) a well-defined profile to search for, and (3) and a low cost of false alarms.
Look at one of data mining’s success stories: credit card fraud. All credit card companies data mine their transaction databases, looking for spending patterns that indicate a stolen card. About 1% of cards are stolen and fraudulently used each year in the U.S.; that’s enough of a population to make searching for them effective. There are also common fraud patterns that can be computed from that data, and they’re easy to search for. Additionally, the cost of a false alarm is only a phone call to the cardholder asking him to verify a couple of purchases. Cardholders don’t even resent these phone calls–as long as they’re not too frequent–so the cost is just a few minutes of operator time.
Terrorist plots are different. First, attacks are very rare. This means that even very accurate systems will be so flooded with false alarms that they will be useless: millions of false alarms for every one real attack, even assuming unrealistically accurate systems.
Let’s look at some numbers. Assume an unrealistically optimistic system with a 1-in-100 false positive rate (99% accurate), and a 1-in-1,000 false negative rate (99.9% accurate). That is, while it will mistakenly classify something innocent as a terrorist plot one in a hundred times, it will only miss a real terrorist plot one in a thousand times. Assume one billion possible “plots” to sift through per year, about four per American citizen, and that there is one actual terrorist plot per year.
Even this unrealistically accurate system will generate 10 million false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 270,000 potential plots in order to find the one real terrorist plot per month.
In statistics, it’s called the “base rate fallacy,” and it applies in other domains as well. For example, even highly accurate medical tests are useless as diagnostic tools if the incidence of the disease is rare in the general population. Terrorist attacks are also rare, so any “test” is going to result in an endless stream of false alarms.
Second, there is no well defined terrorist profile. In hindsight, it was easy to connect the Sept. 11, 2001 dots and point to the warning signs, but it’s much harder to do so before the fact. Certainly, there are common warning signs that many terrorist plots share, but they share them with non-terrorist events as well. We live in a “six degrees of separation” world, where everyone is connected. Add in the problems of sleeper cells, loner terrorists like the Unabomber, and billions of perfectly innocent plots like surprise birthday parties and corporate takeovers, and you have an impossible problem.
And third, the cost of these false alarms is enormous. It’s not just the cost of the FBI agents running around chasing dead-end leads instead of doing things that might actually make us safer, but also the cost in civil liberties. The fundamental freedoms that make our country the envy of the world are valuable, and not something that we should throw away lightly.
There is something un-American about a government program that uses secret criteria to collect dossiers on innocent people and shares that information with various agencies, all without any oversight. It’s the sort of thing you’d have expected from the former Soviet Union or East Germany, or modern-day China.
Finding terrorism plots is not a problem that lends itself to data mining. It’s a needle-in-a-haystack problem, and throwing more hay on the pile doesn’t make the problem any easier. Real security comes from old-fashioned investigative work: putting people in charge of investigating potential plots and letting them direct the computers, instead of putting the computers in charge and letting them decide who should be investigated. It’s what caught the London liquid bombers last summer, and it’s our best hope for our own security in the future.”