When you get an advertisement for a politician on Facebook, thank William Jennings Bryan. The Nebraska congressman and repeated Democratic presidential candidate is said to have been the first person to gather and act on political data, using index cards to record 500,000 supporters’ religions, incomes, and jobs in the early 1900s.
The century since has brought tremendous technological change, but Jennings Bryan’s idea has stayed the same: Know who might vote for you, and convince them to do it.
The latest version of the approach is embodied in Cambridge Analytica, a UK-based firm. CA claims to have ”up to 5,000 data points on over 230 million American voters,” which it uses to create psychological profiles for ”micro-targeted” ad campaigns designed to appeal to each person emotionally. It’s been credited with helping bring about both a Donald Trump presidency and the Brexit vote. Britain’s data protection regulator, the Information Commissioner’s Office, has just announced an investigation into these sorts of methods.
But there’s also been heavy skepticism about CA, not least from Republicans and former Trump aides (paywall); it’s also admitted it didn’t actually use its “psychographic profiles” in the US election. So it possible to separate the hype from the reality?
At heart, what CA does is simple. It’s an application of machine learning: algorithms that learn to make predictions by comparing very large sets of data. Youyou Wu, coauthor of formative research from Cambridge University in 2015 showing how to predict personality based on Facebook Likes—and which would reportedly be copied and turned into Cambridge Analytica’s software—says the algorithm itself is basic; the data are the key.
In Wu’s studies, she usually gathers what she calls the “ground truth”—how people behave in reality—by having them fill out a survey. Cambridge Analytica says it gathers data from, among other things, “land registries, automotive data, shopping data, bonus cards, club memberships, what magazines you read, what churches you attend,” according to a Das Magazin report, as well as data from marketing firms like Acxiom and Experian, Facebook activity, and online surveys, on top of voter registration data. This is all then processed, the company says, to provide a comprehensive view of what kinds of people are likely to vote a certain way.
David Carroll, a professor at Parsons School of Design in New York, wanted to find out whether CA could really do what it claimed. So in February he asked CA to send him all the data it had on him, invoking a UK data-protection law. He couldn’t get it all—some information was protected as “trade secrets” and other bits were owned by third parties, but in March, Carroll finally received an Excel spreadsheet with three tabs and a few dozen lines of data.
The results, he said, were surprisingly accurate. Partisanship? Very unlikely Republican. Propensity to vote? Very high. Cares about immigration? High probability. “I was pretty impressed by how on it was. It’s wasn’t perfect, but it was almost perfect. Uncannily perfect,” Carroll says.
The trouble is, knowing someone’s personality and voting habits is only half the battle. Politicians must then persuade people whom the algorithm has identified as probably undecided to actually vote for them. The form that persuasion takes might be different state by state, county by county, or person by person. Carroll’s data shows that CA is good at the prediction part, but we don’t have evidence about the persuasion part.
This isn’t the case with online shopping. Companies that make their money from advertising, such as Facebook and Google, can track individual users around the internet, checking whether an advertisement led to a purchase. Facebook shows off “success stories” where sites get a 5.6x return on the money they spend on advertisements, and three-fold increase in orders.
In politics, however, we don’t have such granular data on outcomes. We know that Donald Trump is president and that Britain voted to leave the EU, but not who voted that way and whether those people saw targeted online ads. Those outside Cambridge Analytica don’t know where it spent resources to persuade voters.
“Everyone universally agrees that their sales operation is better than their fulfillment product,” a Republican consultant told Ad Age last year. “The product comes late or it’s not quite what you envisioned.”
Regardless of how well CA performed in 2016, the methods that companies like it use will only get more precise. Artificial intelligence and machine learning have boomed in scale and accuracy in the last decade, and yet technologists say these techniques are still nascent.
A bigger question hangs over the availability of data. Wu says sources like Facebook have cracked down on third parties gathering data on its users. On the other hand, the US House of Representatives recently overturned rules that would have prevented internet service providers from selling their customers’ browsing data. Such data is notionally anonymized, but if it could be associated with people’s identities, as has been done in the past, it could be a powerful “ground truth” for training algorithms. How free future elections are will depend a lot on how much control countries gives citizens over their own data.