P(BAD GUY | +) = 1/10102

Simple math shows why the NSA’s Facebook spying is a fool’s errand

Math is hard, but finding persons of interest is even harder.
Math is hard, but finding persons of interest is even harder.
Image: AP/Bernat Armangue
We may earn a commission from links on this page.

A biologist who specializes in statistics has calculated that when the NSA’s alleged broad dragnet of Facebook, Google and other sites turns up a potential terrorist, there’s only a 1 in 10,102 chance that he or she is an actual terrorist. And that’s using very conservative estimates for the accuracy of the NSA’s terrorist-identifying software used in its PRISM spying program, which means that US spies could be reading the contents of hundreds of thousands of Facebook, Gmail, Skype and Apple iMessage accounts in order to find a needle in a haystack.

Based on what biologist Corey Chivers assumes in his estimate of PRISM’s effectiveness, it’s pretty challenging to find an unlikely event (e.g., a person who is a terrorist) in any very large set of data.

Chivers assumed that the NSA’s terrorist-detecting software is 99% accurate (P(+ |  bad guy) = 0.99), which seems charitable. He also assumed that 1 in every 1 million users of any of these online services is a terrorist, which seems high (P(bad guy) = 1/1,000,000). And here’s the equation he plugged those numbers into.

P(bad guy | +) = P(+ | bad guy) P(bad guy)  /  [ P(+ |  bad guy) P(bad guy) + P(+ |  good guy) P(good guy) ]

Solving that equation yields this: P(bad guy | +) = 1/10,102. In other words, 1 in every 10,102 positive hits from the NSA’s algorithm is actually a “bad guy.”

But what if the NSA’s algorithm is less accurate than 99%? And what if terrorists aren’t as fond of Facebook and Gmail as the NSA would like? Plugging less charitable numbers into the equation easily yields results that are far worse than Chivers’ estimate, suggesting that analysts might be confronted with 100,000 false positives for every real terrorist.

Hypothesis confirmed by former NSA official

William Binney, a former intelligence official at the NSA turned whistle blower, thinks Chivers’ estimate is accurate. He says that the algorithms he created to detect persons of interest, for an earlier discontinued project called ThinThread, were about 98% accurate. But for PRISM to be truly effective, he says, the NSA would need algorithms that are sophisticated enough to automatically detect terrorists and make decisions on its own. And according to Binney, the NSA probably isn’t there yet.

“The problem is you have to have automated analysis of the data,” says Binney, who left the NSA in 2001. “[This system] doesn’t do that. It sorts things out for people and eventually presents information for analysts to make decisions.” He believes that the White House’s Big Data Initiative is an attempt to get private companies to help with the NSA’s problems of sorting through all the surveillance it gathers (among other things).

Broad dragnet is actually making the NSA less effective

During Binney’s tenure at the NSA, he says analysts acted as a layer of human intelligence to make up for the agency’s weak ability to parse large data sets. That helped increase the odds that the data they were sifting through contained a person of interest from, say, 1 in 1,000,000 to 1 in 1,000. But even then, says Binney, “[The NSA] just don’t have enough people. Using these systems is like a Google query that returns 100,000 results. If this happens every time they do that to try to find the one thing they’re after, basically what they’re doing is making themselves impotent at discovering threats.”