Pollsters have taken a beating in the last couple of years. They got it so badly wrong in the 2015 British general elections—predicting a race which ended up a Conservative landslide as too close to call—that the British Polling Council commissioned an inquiry.
Pollsters over the pond didn’t fare much better, with most failing to predict Donald Trump’s victory in the 2016 US presidential election. Ahead of a referendum on a Colombian peace deal with the FARC, four leading pollsters projected the “yes” side to win the vote by over 60%. The “no” side ended up winning with 50.2% of the vote.
Public confidence in polling sank after these events, and many were left asking how the polling industry got it so wrong. One Republican strategist went as far as declaring US Election Day “the day the data died.”
But reports of polling’s death is greatly exaggerated, according to a recent paper published in Science. The team of researchers behind the study set out to design a new predictive model based on information gathered from previous elections, and when the algorithm was applied to real-world, real-time elections, its predictions were surprisingly accurate—as long as it included polling data.
First, the researchers took data from an ongoing project at Yale University called NELDA, including all elections in which voters chose a national leader directly from 1945 to 2010 (this ruled out party-based parliamentary systems like the UK’s). They combined this with other economic and political data sets, such as the level of democracy in a country; whether a reliable poll existed before the election; whether a reliable poll predicted the incumbent party to win; and the health of the economy. They ran the data though a machine-learning algorithm, and came up with a prediction model. When tested against real historical outcomes, the model’s predictions were about 80% accurate.
The model was then used to predict ongoing elections. By combining polling data in the weeks prior to elections in Latin America in 2013 and 2014 and their model, the model correctly forecast the winners in 10 of 11 elections, or 90.9%, according to the study.
These initial results sparked several more questions, says lead researcher Ryan Kennedy, a political scientist at the University of Houston. To what extent was their result dependent on having accurate polls? Using better polling data, could they get even better at predicting something like the margin by which a given party would win or lose an election?
When researchers recreated their initial analysis without polling data, they saw a significant drop in accuracy. In one test, the model with polling data accurately predicted the outcome of the election 80% of the time, while the model without polling data was about 65% accurate. “We really needed that polling data in there,” Kennedy says.
Kennedy and his team then started collecting as much polling data as possible for their second model—in an effort to not only predict who won or lost, but also by what margin. They ended up with 4,331 different polls covering 146 rounds of voting in 122 elections. They created a new model based on this information and once again combined it with economic and political variables. The findings emphasized the importance of polls; the model predicted the eventual victory or loss margin of the incumbent party’s candidate 90% of the time.
“It does have some errors, even in our model we miss one time out of ten,” Kennedy explains. “But the alternative—just looking at structural indicators or going by what paid political pundits are saying on television—is not a better option. [Polling] is still the best technology that we have.”
Polling has improved over time, but the industry is now being forced to face new and difficult challenges. The biggest, perhaps, is finding enough people interested in filling out surveys. This recruitment issue is plaguing pollsters across the world, who are now under increased pressure to get more people to engage with their surveys to get a more representative sample.