There are lots of ways to forecast elections—public opinion surveys, economic fundamentals, consulting the entrails of chickens—but my new favorite comes from the blog of Sam Wang, a Princeton neuroscientist with a sideline in statistical election analysis.
Wang shared a model that uses the aggregated results of US Google searches to predict upcoming Republican primary elections. The results are surprisingly plausible (spoiler alert: Donald Trump wins).
It works like this: Google lets you see which search terms correlate with a specific pattern in each US state. If you feed it data about how many votes each candidate has received in the primary states that have voted so far, the tool gives you back a list of 100 search strings that correlate most strongly with those results.
So, for example, if you give the tool a list of Ohio governor John Kasich’s primary results relative to those of Trump and Texas senator Ted Cruz, it figures out which Google searches best match his share of the vote in each state thus far: 14.5% in New York, 18% in Massachusetts, etc. It turns out the top-correlating search string for Kasich is “renewable portfolio standard”:—this chart compares his vote share to how frequently the the term is searched.
Here’s a list of the top 10 correlated search strings Quartz generated by entering each candidate’s vote share relative to the remaining contenders across 34 states and the District of Columbia. We think you’ll agree that Kasich’s are almost too on the nose:
With a list of all 100 correlated search queries for each candidate, analysts can look to see how often they crop up in states that haven’t voted yet, and work backwards to forecast the vote with some accuracy.
Wang found that this method mostly agreed with other predictive measures, like public opinion polls and the results from neighboring states. But there were definite outliers: Google correlations suggest an enormous blow-out Trump win coming in Delaware on Tuesday (April 26), and a surprise victory for the real estate mogul in New Mexico.
This method is powerful because harnesses lots of data about people’s preferences without them knowing it is for political purposes. But like any forecast, it should be taken with many grains of salt. Google’s attempt to predict flu outbreaks with its search data is proof that the best models incorporate many different data sources and techniques.
But it’s fascinating to think that we can predict the future using a vast trove of data from our daily behavior.