If we’ve learned anything over the past few years, it’s that governments will make all sorts of excuses to justify the surveillance of its citizens.
Does that mean law-abiding people can no longer expect any privacy? Not according to Michael Kearns of the University of Pennsylvania—or, more specifically, an algorithm he has developed.
Governments say that they need to collect massive amounts of data from communications and other social networks in order to spot potential warning signs of, say, an imminent terror attack. With access to that data, the government can, often without permission, look for suspicions patterns that signal potential danger. This is called “graph search,” and it’s one of the most common methods of gathering intelligence.
In this context, privacy is far from guaranteed. But Kearns thinks that if we are to continue to live under mass surveillance, there ought to be way out for law-abiding citizens’ right to privacy to be preserved.
His solution is an algorithm that gives the government “provable privacy guarantees for its citizens,” Kearns told Quartz. A study about it has been published in the Proceedings of the National Academy of Sciences.
Here’s how it works.
Say you live in New York and have a rare disease. Only your doctor, who works at a highly specialized clinic, and your family is aware of the disease. It should be your right to keep the information about your condition private.
Now say you also have an aunt, who is a doctor in Philadelphia. You often call her for health advice.
Unbeknownst to you, both your New York doctor’s clinic and your aunt’s clinic are involved in medical fraud. Somehow, the government gets a hint about fraud in the Philadelphia clinic where your aunt works.
The government casts its net to see who else might be involved. Because your call records show up in the Philadelphia clinic, the government thinks it should see whether you contacted any other doctors. That will lead them to the New York clinic, and may uncover its fraud when they dig into its financial dealings.
If the government ends up busting both clinics, there’s a risk that people could find out about your disease. Some friends may know about your aunt and that you visit some sort clinic in New York; government records related to the investigation, or comments by officials describing how they built their case, may be enough for some people to draw connections between you, the specialized clinic, and the state of your health.
“Now your privacy has been compromised, even though you weren’t arrested or even mentioned in the criminal proceedings,” Kearns told Quartz. “It happened because there is some auxiliary information about you that was public, and that meant some people who you didn’t want informed about your disease were able to infer you had it.” If your insurance provider gets wind of your illness, it might boost your premiums or even stop providing coverage altogether.
For such cases where there are only a few connections between people or organizations under suspicion, Kearns’s algorithm would warn investigators that taking action could result in a breach of privacy for selected people. If a law were to require a greater algorithmic burden of proof for medical-fraud cases, investigators would need to find alternative routes to justify going after the New York clinic.
But if there were lots of people who could serve as links between the two frauds, Kearns’s algorithm would let the government proceed with targeting and exposing both clinics. In this situation, the odds of comprising select individuals’ privacy is lower. Of course, if an investigation focused on suspected terrorism instead of fraud, the law may allow the government to risk compromising privacy in the interest of public safety.
Kearns’s algorithm is also designed to inject noise into social networks to shield the privacy of people innocently linked to targets, without hampering investigators’ attempts to identify potential wrongdoing. This would entail ”mild, but important, departures from commonly used approaches,” Kearns’s study notes.
Although this remains a proof of concept, Kearns and his colleagues ran some promising experiments on large datasets. They set themselves the task of identifying random subgroups of actors listed in the IMDB movie database and academic authors in a database of scientific papers, starting with random members of these groups and working through observable connections. The algorithm intended to protect the privacy of individuals only coincidentally linked to target networks performed pretty well compared with the one that paid no heed to who was identified and exposed in the search for suspects.