With these six attributes, there are 13,921 combinations. Three in 10 combinations result in a single unique individual. Still, those combinations are rare; just 4,205 individuals among the 1.3 million, or 0.33%, can be uniquely identified if you know those six things about them.

The structure of a database can make some groups of individuals more likely to be uniquely identified. In our example, the chance of identifying Jordan depends heavily on the demographic groups to which he belongs. If Jordan is a Black, multi-ethnic man, aged 18 to 29, the probability of identifying him can be as high as 27%, a big jump from the 0.51% probability of identifying the average individual. That’s because there are only 15 Black, multi-ethnic men aged 18 to 29 in New York’s database.

Even in cases where multiple records match the profile of Jordan, a comparison of the records themselves could lead you to Jordan: Is it more likely that Jordan was treated for “congestive heart failure” or “mood disorders”? Or, if you knew that Jordan was only away for one week, you could exclude the matching profile with a 50-day hospital stay.

In the US, the Health Insurance Portability and Accountability Act of 1996 (HIPAA) protects the privacy of patients by only allowing personal identifiable information to be used or disclosed under a limited number of circumstances. For example, it would be illegal to make public this database if it had the patient’s name, home address, phone number, or social security number included. HIPAA also regulates that there are no restrictions on the use or disclosure of de-identified health information. Yet, given just a small bit of external information, one can easily link de-identified health records back to individuals using simple techniques.

De-anonymizing data often involves combining multiple databases to extract unrelated information about the same person and piece together a full picture. A entire industry of data brokers has sprung up to take on this work and sell your information to others.

Insurance companies want to know your medical history. Car sellers want to get a hold of your driving habits. Real estate brokers would pay to find out if you just had a newborn and are looking to buy a house.

The example here uses just six factors about an individual to try to identify them. Databases collected by companies can contain hundreds of factors about a person. With the amount of data consumers consciously and unconsciously give away to big tech platforms, de-anonymization has become easier to do in recent years. And not just with simple methods like this.

📬 Sign up for the Daily Brief

Our free, fast, and fun briefing on the global economy, delivered every weekday morning.