When Judge Robert Bork was nominated for the Supreme Court in 1987, a reporter for the weekly Washington City Paper discovered they both used the same video rental store. He popped in, asked for a list Bork’s rented movies, and published an article revealing the judge’s love of Hitchcock films and British costume dramas.
So began the creation of one of the strongest data privacy laws in the United States.
Lawmakers were aghast. “Public officials, hearing of Bork’s rentals coming to light, even in whimsy, imagined their video records surfacing like so many nightmares,” wrote Michael Dolan, the journalist behind the Bork tapes. “First local and then state and finally federal law was enacted to keep anybody else from doing what I had done.” The 1988 Video Privacy Protection Act (VPPA) made it illegal to share video lists without the customer’s written consent.
Three decades after the Bork tape saga, the law is nowhere near as strong (Netflix persuaded the House of Representatives to amend it in 2012), and the level of privacy concern around video rental data seems quaint.
Smartphones and the internet have opened up an abundance of data on every individual, which companies vie to harvest and merge. And those who control this data have massive power and insight. Data has been used to target women in abortion clinics, to profile voters based on information purchased from a parenting advice website, and to identify when teenagers are feeling “worthless,” so as to better sell to them.
Collecting and selling data is a massive gold rush. The international data market was valued at $20.6 billion at the end of 2018, with the US market making up $12.3 billion of that share, followed by the UK and China markets, both at $2.4 billion. The players in this new economy are quietly infiltrating and affecting every aspect of our lives.
Companies use data to identify customers’ interests, how long they use products, and what they might want, all of which helps them advertise effectively and sell more. A growing number of companies even offer different prices to different customers, based on machine-learning algorithms that evaluate their likelihood of buying something at a given price.
But data is increasingly valuable in its own right, rather than as a means to analyze and enhance sales. As such, it’s often harvested simply to sell onto someone else. “The companies collecting the data don’t fully know what they’re going to use it for,” says Serge Egelman, research director of the Usable Security at Privacy Group at the International Computer Science Institute.
Governments have tapped into the trove of new information online in their bid to use data for security. Whistleblower Edward Snowden revealed that the National Security Agency (NSA) was given access to phone records of millions of people in the US, while the UK Government Communications Headquarters (GCHQ) tapped fiber optic cables internationally to intercept internet data.
John Tye, former section chief for internet freedom in the US State Department, who was a whistleblower under the Obama administration, says the US government use of data is so “legally, politically, technologically” complicated that only a handful of people inside the US government understand the full scope.
The government use of data, says Tye, is so complex that the NSA created a data center in Utah that’s “five times the size of the US Capitol building” and has a $40 million a year electricity bill. The data center collects information including emails, cell phone calls, and Google searches, as well as purchase history and other personal data. “It is, in some measure, the realization of the ‘total information awareness’ program created during the first term of the Bush administration—an effort that was killed by Congress in 2003 after it caused an outcry over its potential for invading Americans’ privacy,” reported Wired in 2012.
Much government use of data is kept secret, but a few details are necessarily public. For example, the US state department recently told visitors they’d have to hand over all social media details for visa applications, though it hasn’t disclosed exactly how this information will be used.
Government use of data also informs police work. Fieke Jansen, a doctoral student at the Data Justice Lab of Cardiff University who studies data trends in policing, says that in much of Europe, such as in the UK, Germany, and The Netherlands, there’s huge investment in building up and combining police databases. Police forces have started experimenting with automatic fingerprint identification, facial and voice recognition, and automatic license plate recognition, she notes in her research. The expansion of databases will likely increase police pressure to collect data, for example if someone makes a police report or is stopped for speeding.
While police forces are building their own databases, they’re also turning to private companies for data. Google has been asked to provide GPS info showing if someone was near a crime scene. And the first case using DNA evidence uploaded to GedMatch, a website that allows people to upload data from DNA testing kits, recently went to trial.
Politicians who use data to target voters are some of the more controversial figures in the industry. Political consulting firm Cambridge Analytica notoriously harvested the data of 50 million people through Facebook so as to profile people and send them personalized political adverts. Similar tactics were used to send political propaganda via WhatsApp in Brazil’s 2018 election, which pushed far-right President Bolsonaro to power.
Research shows targeted ads can be highly effective. The Trump campaign spent $44 million on Facebook adverts in 2016, compared to $28 million from the Clinton campaign. Data provided by Cambridge Analytica identified those who should be targeted by particular adverts. For example, one advert portraying Hilary Clinton as racist was targeted at “only the people we want to see it,” Brad Parscale, a San Antonio marketing entrepreneur and Trump aide, told Bloomberg ahead of the election. “It will dramatically affect her ability to turn these people out,” he added.
Trump is far from the only one to benefit from misuse of data. Last year, parenting advice website Emma’s Diary was fined for selling its data to the UK Labour party. Labour got this data via Experian and legally works with the company for direct marketing campaigns.
Though UK political parties are allowed to work with data brokers on campaigns, this can nevertheless have concerning implications. “We’re talking about a ranking of the priorities of certain voters over others, as opposed to the more idealistic spirit of democracy, where everyone is spoken to with the same message so the group with the better idea or more convincing argument wins out,” says Gary Wright, researcher for Tactical Tech non-profit. “It gets to the heart of how the digital landscape is shaping democracy. We’re witnessing a quantification, segmentation, ranking, ordering of the electorate.”
Many of the largest companies in the market are “data brokers” that collect information about consumers and sell the data to other companies. Some of the biggest include:
- Oracle, a Californian computer company that, as the Financial Times reports, sells data on more than 300 million people, including more than 80% of the US internet population, with 30,000 data points per individual.
- Experian, a credit rating agency that has data on 300 million people and works to segment populations into “19 overarching groups and 71 underlying types.”
- Acxiom, which has data on around “250 million addressable consumers” in the US alone: Last year, the company was bought by Interpublic Group for $2 billion.
- Epsilon, which was bought by advertising company Publicis earlier this year for $4.4 billion.
- Nielsen, which boasts purchase history data on 90 million US households, or three quarters of all households in the United States.
Some companies, called “onboarders,” specialize in transferring data stored offline—such as contact information, loyalty card data, purchases, subscriptions, and the demographic data that is often stored in companies’ Customer Relationship Management (CRM) systems—online so they can be used for marketing purposes. Many of the biggest data onboarders have been bought by major data brokers. The data they upload is sold to advertisers, businesses, and other data brokers. Onboarders include:
- LiveRamp, which was bought by Axciom for $310 million in 2014.
- Datalogix, which was bought by Oracle for $1.2 billion in 2014.
- Neustar, which partnered with Experian to provide “advanced data onboarding solutions” in 2016.
It’s no accident that data dealers have started acquiring each other and working together. “A feature of the digital data economy is consolidation,” says Jeffrey Chester, Executive Director of the Center for Digital Democracy. “Everyone’s working with everyone else.” Much of this, says Chester, is in response to the tech giants (more on them below). The likes of Amazon, Facebook, and Google have immense amounts of data, and others in the industry have merged to try and compete. “At the end of the day,” he says, “you have a handful of global data power brokers.”
Though the companies listed above are some of the best-known names in data, it’s a mistake to assume the field is clearly defined. “‘Data broker’ is a contentious term,” says Frederike Kaltheuner, head of corporate exploitation at Privacy International. There’s an overlap between data brokers, ad tech, and credit ratings agencies. And two of the companies that hoard the most data, Facebook and Google, don’t consider themselves data brokers.
“They play the same role of a data broker,” says Jason Kint, chief executive of non-profit Digital Content Next. “Their entire business model is dependent on harvesting as much data as possible.”
Google Analytics, which tracks visitors to websites, is used by 75% of the top 100,000 most visited websites, while Google Marketing (formerly known as DoubleClick) is collecting data from more than 1.6 million websites. Meanwhile, Facebook tracks visitors on more than 8.4 million websites. These companies don’t sell the data they harvest to other companies, but they do sell access: Advertisers can target precise demographics via Facebook and Google, and this is only possible thanks to the tech giants’ mounds of data.
Amazon, meanwhile, owns the purchasing histories of more than 150 million customers. This data is hugely valuable, because it reveals how customers spend, as well as indicating whether advertising is effective. But Amazon has other data sources too: Earlier this year, it was revealed Amazon staff listen to a selection of recordings captured by Alexa.
Apple bills itself as the privacy-friendly big tech company but, like Amazon, it has access to plenty of voice data via Siri and purchase history from iTunes, Apple Pay, and its app store. Plus there’s device data, GPS, iCloud, and photos (users can download their Apple data at privacy.apple.com). And while Apple doesn’t sell that data, it does sell targeted ads based on news and app store use. Plus, Apple accepted $12 billion from Google to be the default search engine on Apple products in 2019, which suggests the company doesn’t have a problem with Google collecting our search data.
There’s no clear divide between companies that are interested in gathering data and those that aren’t. After all, everyone selling a product relies on data to sell it more efficiently. “The capabilities of data brokers have been incorporated into every global leading business,” says Chester. The likes of Target, General Motors, and McDonalds “are in the big data business now,” he adds. “Companies are bringing a vast amount of data in house.”
It’s not just major companies collecting customer data. One analysis of the “marketing technology” sphere (companies that typically rely on data to market products), found more than 7,000 examples of such companies in the United States, compared to around 150 in 2011.
Small companies can have major impact. For example, Vectuary, a small French digital marketing company, was investigated under Europe’s General Data Protection Regulation (GDPR) for failing to get proper consent for gathering data. Though Vectuary is not well known, it was found to have vacuumed up data from 67.6 million people via 32,000 apps. “The most invasive stuff seems to come from small companies,” says the International Computer Science Institute’s Egelman. “The big companies that have eyes on them, like Facebook and Google, can hire people for privacy teams.” Smaller companies don’t have the same means, and generally face less scrutiny.
Nearly every app you use on the phone shares data with a third party,” says Kaltheuner. “Have you agreed to this? Maybe.” Consent forms can be incomprehensible to anyone who isn’t a legal expert. But harvesting and selling data is the primary business model for apps; one 2015 study of apps in Australia, Brazil, Germany, and the US found that 85% to 95% of free apps and 60% of paid apps share personal data with third parties. In one particularly unnerving example, Facebook paid teenagers to install an app that downloaded personal data, including photos, emails, and messages, from their phones.
Seemingly basic data can be used to predict extremely personal characteristics. The table below highlights some of these connections; aside from the first line on smartphone data, based on a 2011 study, and the line on evaluating creditworthiness based on Facebook friends, each point is explored in more detail throughout this article.
Knowing where you are every minute of the day goes a long way toward figuring out exactly who you are. Cell phones hold the key to this information, as people increasingly carry these easily trackable devices wherever they go. US cell carriers, including AT&T, Verizon, T-Mobile and Sprint, have sold customers’ real-time location data, and some of that information was then both sold on again and leaked. They promised to stop when they were caught last year, but further investigations found location sharing is still rife.
Meanwhile, countless phone apps such as TheWeatherChannel (which tells the weather), GasBuddy (which finds nearby gas stations), and many more also harvest and sell location data. A New York Times December 2018 article showed it’s possible to identify people based on location data from apps. The data is so granular, that the Times could track someone to a doctor’s appointment and see how long she was there. There are 75 companies getting such location data from around 200 million smartphones in the United States, according to the Times report.
Location data is hugely revealing. “Are you at a computer, do you work at home, are you unemployed?,” says Kaltheuner. “How fast do you drive your car, do you take the train?”
The information is frequently used by advertisers to target someone when they’re in a precise location, such as near a particular shopping center. It’s also been used to target anti-abortion adverts at women entering Planned Parenthood clinics.
Wellness and fitness apps are some of the most prolific gatherers of health data; their ‘wellness’ branding makes it easier to avoid health data protection laws that apply to medical providers. Flo Period & Ovulation Tracker collects data on when a woman has her period or is planning to get pregnant and, earlier this year, the Wall Street Journal revealed the app was sharing this information with Facebook. Several other health and fitness apps, including those that measure heart rate, weight, and activity, were doing the same. It’s not only wellness apps storing health data: Grindr dating app shared users’ HIV status and “last tested data” with two other companies, BuzzFeed revealed last year.
Many apps aren’t explicit about their intention to share data. Kaltheuner said she recently wanted to download a mood-tracking app, but gave up over privacy concerns. “I don’t want anyone to know what I’m entering. I couldn’t find a single app that I would trust,” she said. (Quartz uncovered several other apps’ sharing personal data with third parties in part five of this series.)
Companies that gather sufficient datasets can profit by selling them to pharmaceutical companies. Personal genomics company 23andMe, for example, sold GlaxoSmithKline the exclusive rights to more than 5 million customers’ genetic data. “Once you have the data, [the company] does actually become the Google of personalized health care,” Patrick Chung, a 23andMe board member, told Fast Company. Though the company does ask customers whether they want to share their data, privacy advocates say the wording can be confusing. For example, the company states that research based on analyzing its databases “does not constitute research on human subjects,” as Scientific American reports, and so isn’t subject to the same privacy rules.
What you Google, click on, and upload online reveals a huge amount about you. Studies have shown that both Facebook likes and dating site photos can be used to predict sexual orientation, while keystroke patterns can reveal neurodegenerative diseases. Everything from Instagram filters to pronoun use on social media has been found to predict depression.
Such personal information can be used for good: Detecting diseases can be used to provide treatment, and Facebook has an algorithm that assesses suicide risk and can call the police for help.
But access to personal information also carries tremendous risk. Lyle Ungar, a University of Pennsylvania professor who was co-author of the study on pronoun use and depression, said he’s been approached by several insurance companies keen to use his data analysis to determine premiums. Incidentally, New York’s Department of Financial Services (NYFS) decided earlier this year that life insurers can use data from social media to determine premiums.
AI facial recognition is an increasingly valuable and common form of data. President Donald Trump has announced plans for facial recognition technology to be used for all international passengers in the 20 biggest US airports by 2021. There are no limits on how partnering airlines can use this data. In China, facial recognition is even more commonplace; the country’s largest insurer, Ping An Insurance, uses the technology both to verify identity and to scan for signs of trustworthiness.
Major data corporations often claim to only process “anonymized” data, but this term is misleading. Typically, it means that the data isn’t attached to a name and email addresses and phone numbers are hashed. But rather than masking identity, these hashed numbers serve as their own identifiers. Effectively, the data is pseudonymised.
The more pseudonymised data you have on someone, the less likely it is to be anonymous. Tens of thousands of twenty-something women live in Brooklyn, for example, so that data profile isn’t particularly identifying. But if you narrow it down to 29-year-old women who live in Gowanus, go to the YMCA, earn $92,600 a year, and love watching horror movies, well, there aren’t too many candidates.
It can be very easy to de-anonymize data. Back in 2006, for example, Netflix published 10 million movie rankings by 500,000 customers, and two researchers quickly de-anonymized some of the data by matching it with public information on rankings and timestamps in the Internet Movie Database. (Learn how easy it is to de-anonymize data in part four of this Quartz series.)
Data isn’t necessarily all linked together and tied to one individual, but a process called “identity resolution” is used to match up different data sets to create more detailed profiles. Major data brokers such as Acxiom and Experian have troves of data that are used to help accurately identify individual customers.
Meanwhile, tech companies work to collect masses of data to create a full picture of customers. When Google bought digital ad company DoubleClick in 2007, it promised it would not combine DoubleClick cookies, which track browsing history, with the names and identifying details collected by Gmail and YouTube. In 2016, it dropped that division. “You have to ask yourself why,” says Digital Content Next’s Kint. “That certainly was an aggressive move that could backfire based on previous promises.” Kint suggests Google made the change to compete with Facebook, which started tracking people as they browse the internet in 2014. Facebook’s single identifier was a huge advantage. “Facebook had a deterministic data point to know this is precisely Olivia or Jason,” Kint says. “Google needed that in mobile and could do that by merging their services.”
As Edward Snowden wrote in a 2015 Reddit AMA, “Arguing that you don’t care about the right to privacy because you have nothing to hide is no different than saying you don’t care about free speech because you have nothing to say.”
Data reveals hugely personal information and, as in any negotiation, this insight can be used to manipulate people. In 2017, a Facebook document was leaked showing that the company offered advertisers the chance to target teenagers in moments when they were feeling “worthless,” “insecure,” “stressed,” “defeated,” “anxious,” and like a “failure.” Facebook knew about users emotional states thanks to their data on photos, interactions, post, and internet activity. The company immediately denied that it had used such practices in targeted advertising.
Imagine if you went to a library to research substance abuse, and a librarian put a sticker on your back that says ‘interested in substance abuse,’ says Johnny Ryan, chief policy officer at Brave, a web browser that blocks ads and website trackers. Cookies are the equivalent of these stickers being put on your back. It’s extremely difficult to control who has access to data, meaning that, for those who’ve suffered from drug addiction or serious diseases, data trails are likely spreading news of such conditions.
Data oversight has a disturbing tendency to affect those in times of crisis. Health insurers, for instance, collect masses of data, and this information significantly impacts those who are already struggling, building into higher premiums for those who are less wealthy. As ProPublica and NPR put it: “Low-income and a minority? That means, the data brokers say, you are more likely to live in a dilapidated and dangerous neighborhood, increasing your health risks. Are you a woman who’s purchased plus-size clothing? You’re considered at risk of depression. Mental health care can be expensive.”
In the United States, it’s illegal to deny someone a mortgage loan based on race, a practice known as redlining. It is legal though, notes Ungar, to offer mortgage loans based on language use on social media. And Ungar says it’s very easy to determine race based on the data available on social media.
Similarly, numerous police forces use AI to predict future crime based on existing data, but there are serious concerns that, if the datasets are flawed, they could reinforce biases such as racism. In the UK, for example, the Metropolitan police in London created a “gangs matrix” listing 3,806 people of whom 78% were black, though statistics show that just 27% of people responsible for serious youth violence in London are black. It was so disproportionately focused on black men, the matrix failed to include any Turkish gang members in an area known for several Turkish gangs. The information on the gangs matrix was also shared with other agencies, such as housing associations and schools, which was deemed a breach of data protection laws.
The mere threat of data surveillance also has a greater impact on people of color. In the 19th century, British philosopher Jeremy Bentham invented the panopticon, a prison designed so that prisoners could never tell when they’re being watched. That concept alone would change prisoners’ behavior, he argued. Today, all our Google searches are being watched.
Kaltheuner (who isn’t Muslim) says that, shortly after watching a YouTube tutorial on tying a hijab, she noticed adverts warning her of the dangers of joining ISIS. Meanwhile, when counterterrorism police visited a family’s home in response to google searches on pressure cookers and backpacks, they said they made around 100 such visitors a week. Internet monitoring can work much like the panopticon, and, for minority groups who are already persecuted, it creates a heightened fear about what their Google searches might imply. Even for those who aren’t visited by the police, the concept of data tracking can restrict freedoms.
Companies can use data to easily identify big spenders, which can lead stores to wooing richer customers and ignoring those who are less well off. When Amazon launched one-day shipping, for instance, Bloomberg found the service was heavily biased in favor of white customers. In Boston, for example, the primarily black neighborhood of Roxbury was denied the service, while all the surrounding neighborhoods were included. Amazon didn’t factor race into its data-based decision to roll out the service to certain neighborhoods, but by focusing on those zip codes with the highest number of Amazon Prime subscribers, it reinforced neighborhood inequality. Ungar says it’s likely that many other companies, such as banks offering loans, use similar data to inform their decisions; such practices are simply less accessible to public scrutiny than Amazon’s service.
“A lot of it is invasive and creepy, but there’s no guarantee it’s correct,” says Kaltheuner. “If you’re classified negatively, you can have no idea of the consequences.” In Propublica and NPR’s investigation into how health insurers use data, for example, a LexisNexis Risk Solutions employee said a high school dropout with a recent income loss who doesn’t live near relatives might have higher health costs than someone who fits a different profile. Then again, they might not—but could still face the higher premiums, because of this data.
It takes considerable work to ensure data is not the basis for biased artificial intelligence, and there are several troubling examples of AI-driven prejudice. Amazon’s AI recruitment tool was found to be biased against women; Propublica found that COMPAS algorithm, which is used to predict how likely a criminal is to reoffend and so inform sentencing, was more likely to incorrectly assess black defendants than white defendants to be at a higher risk of recidivism (while white defendants were more likely to be incorrectly assessed as low risk); and Amazon’s facial recognition tool, sold to the likes of the US police departments and Immigration and Customs Enforcement, is worse at determining gender for women and darker-skinned people.
First, don’t bother with incognito mode. Google can link that browsing history to your account. Google has made it easier to delete data, allowing users to delete past activity and to opt out of tracking by unchecking the box by ‘Include Chrome history and activity from sites, apps, and devices that use Google services’ on the Activity Control page for My Account.
Facebook allows users to manage some of the data that goes into targeted ads. Generally, given that so many apps share data with third parties, it’s more secure to look up sites online than to download an app. The best protection comes from downloading both a VPN, which hides your IP using a web proxy, and an ad blocker.
Ultimately, though, there’s only so much individuals can do. Data breaches are frequent—even sensitive data, such as US Customs and Border Protection’s database of photos of people coming in and out of the United States, has been leaked. Security experts I spoke with emphasized that true protection can’t come from individuals, but the law.
New data legislation is desperately needed to keep up with the technological advances since the days of video rental laws. One of the most notable developments in data protection, covering the widest area, is GDPR, which was implemented across Europe in May 2018. This law demands that companies get consent before collecting data, and also gives people the right to ask companies how their data is being collected and request that data be deleted. Though strong in theory, with countries including United Arab Emirates and China developing their own versions, GDPR has yet to be fully enforced.
For one thing, it’s not always clear what customers are consenting to. “Realistically, most privacy policies will still not be human readable and will be hiding the needles in a haystack of legalese,” Yana Welinder, a fellow at the Center for Internet and Society at Stanford Law School, told Wired when the law was first passed. Though every website now comes with an option asking you to accept the company’s use of data, it can be difficult to find the right button to deny consent. Dehaye refers to all the emails and website messages about data consent as “a cosmetic layer” and “a fig leaf.”
Companies can get away with this because there aren’t many resources devoted to investigating transgressions. In November, Privacy International filed complaints against data brokers Acxiom and Oracle, ad-tech companies Criteo, Quantcast, Tapad, and credit referencing agencies Equifax and Experian, with data protection authorities in France, Ireland, and the UK. The complaints allege these companies have no lawful basis for their collection of data, and do not comply with the laws principles of “transparency, fairness, lawfulness, purpose limitation, data minimization and accuracy.” There have yet to be major consequences. “We’re waiting for strong decisions that will send a strong signal,” says Kaltheuner.
Egelman says he’s noticed “absolutely rampant” violations of GDPR, and adds it’s a “little disappointing” that there have been so few consequences for companies that abuse data. Indeed, a European Data Protection Board report in February showed nearly 90% of the €55,955,871 penalties imposed since GDPR were accounted for by one large fine against Google.
But while Europe isn’t yet enforcing its policies, the United States—the largest data market in the world—has yet to introduce comparable legislation at the federal level.
California’s Consumer Privacy Act (CCPA) is the one piece of legislation that can be compared to GDPR. Similar bills in Washington and Texas have been rejected, though Nevada has updated its online privacy law, while New York and Washington DC are both considering CCPA-like bills.
The Federal Trade Commission has the remit to investigate data and privacy breaches, but it’s light on resources. And though there are several bills on the table, including one from Oregon US Senator Ron Wyden (you can read an interview with Senator Wyden in part three of this series), so far the United States is operating under a piecemeal, state-based approach.
Corporations and governments want to predict what you will do based on the mass of data about what you have done. And so, while it’s worrying for these institutions to know so much about millions of people, the data industry is leading to even more troubling implications: Not merely data-based knowledge, but control.
Shoshana Zuboff, professor emerita at Harvard Business School, notes endless examples of this phenomenon in her latest book The Age of Surveillance Capitalism. “[T]he surest way to predict behavior is to intervene at its source and shape it,” she writes. And so digital interventions “nudge, tune, herd, manipulate and modify behavior in specific directions by executing actions as subtle as inserting a specific phrase into your Facebook news feed, timing the appearance of a BUY button on your phone, or shutting down your car engine when an insurance payment is late,” she adds.
This shift is well on its way. “Once we searched Google, but now Google searches us,” Zuboff said in a Guardian interview. “Once we thought of digital services as free, but now surveillance capitalists think of us as free.”
Three decades on from the Bork tapes controversy, millions of people have embraced far greater invasions of privacy for the sake of online convenience. Perhaps the concept of companies knowing and acting on personal data isn’t as instinctively alarming as the public reading about our favorite movies in the newspaper. But though we may not see them, notice them, or even know their names, data brokers know our desires and how to influence us. Every purchase we make and every website we browse, they’re there, quietly pull the strings on our lives.