“Causation is not correlation” is a favorite mantra of statistics nerds. To see why, just look at a few charts on the blog Spurious Correlation, like this one showing bedsheet-related deaths moving in lockstep with ski resort revenues. Correlation can’t be dismissed outright, though: proving that X happened because of Y is usually close to impossible, and correlation is often the best we’ve got.
A new paper has applied this thinking to social science. The researchers processed complex survey data tens of millions of times in an attempt to develop a model that could predict outbreaks of violence in Liberia. They succeeded.
Co-author Chris Blattman says he wanted to name the paper “I just ran 32 million regressions,” though the real title—”Predicting Local Violence”—is pretty bold in its own right. The research draws on wide-ranging data from nearly 250 Liberian communities, including surveys of villagers and data on the villages themselves. Altogether the researchers came up with 56 risk factors that could influence the likelihood of violent outbreaks.
They then developed their model by determining which variables from 2008 were most closely associated with violence in 2010. With that in place, they were able to accurately predict almost 90% of the violence that took place in 2012.
Minority Report-style predictions aren’t the goal. Rather it is to identify the features that make a community more susceptible to violence. If the model’s predictions are better than the alternatives, governments and other organizations can make use of the conclusions to target their policy efforts. (The paper was in part funded by international organizations that wanted to know whether such predictions could be used for early warning systems.)
Only a few risk factors really moved the needle. “Five variables helped us predict three-quarters of the effect,” said Blattman. Here are the five factors that were the best predictors in the most successful model.
While some of these factors are intuitive, others are not. Clearly, belief that other tribes are violent is likely to be an indicator of violence. Yet the authors write that “economic conditions such as poverty and economic shocks” have poor predictive power. The best predictor is something of a surprise as well: the research showed that inclusion of minorities in power-sharing in local government contributed to higher levels of violence. This was the “single most robust predictor” in the best-performing model.
The authors emphasize that their results are not conclusive, but offer “stylized facts” that broader theories of violence can incorporate. The idea that power-sharing contributes to instability is not new in the academic literature (PDF), but the research offers a new insight into how it may affect outcomes on the ground.
Facebook uses similar strategies to predict how long your relationship will last, and Twitter does the same to guess what the stock market might do. Academics, meanwhile, have been slow to incorporate these kinds of models. ”A final conclusion we draw”, the authors write, “is that the balance of political science research should shift from virtually no prediction exercises to at least a few.” That doesn’t seem like a lot to ask. This research took place well before the outbreak of Ebola started killing thousands in Liberia, but it’s not hard to imagine similar social factors and surveys being used to predict the likelihood that the disease could spread to a particular community.
And in the meantime, political science research has gone from virtually no prediction exercises to at least one.