We have access to more data than ever—why do we get so much of it wrong?

Overwhelmed with data
Overwhelmed with data
Image: Sukree Sukplang/Reuters
We may earn a commission from links on this page.

We live in a world where charges of fake news abound, and newspapers now employ high-profile fact checkers. As we try to make sense of the world, numbers should tell us the cold, hard truth. But when data is subject to interpretation and a series of widely believed statistics are debunked, it suggests even the numbers can’t always be believed.

I recently wrote about how one oft-cited figure, that 40% of Americans can’t cover a $400 expense, isn’t quite true (it’s really 12%, because many Americans use credit cards to finance emergency expenses). Meanwhile a recent study disputes the frequently cited statistic (and the subject of a successful book) that 3.6 million American households live on less than $2 a day. Also in the news, a London School of Economics professor and author of a forthcoming book claimed childless single people are happier than marrieds. But some of his results were based on a misunderstanding of how a survey conducted by the US Census classifies single people.

To some extent, the misuse of data has always been an issue and there’s nothing new here. How we interpret data is subject to the judgement (and bias) of even the most well-meaning analyst and there have always been data skeptics. But there are reasons to believe it is getting worse.

Technology, social media, and just the availability of more data is leading to more cases of familiar statistics being disproven in very public ways.  It is healthy and necessary to have these discussions because for us to have any faith in data, analysts should be held to the highest standards. But the regular debunking of statistics we all think are true further erodes our trust in data. That’s worrisome because in a world where people seek out the news sources most inclined to agree with them—and no one can agree on much—data remains the closest thing we have to an unbiased truth.

So how can an open-minded person make sense of it all? Anytime we look at a statistic there are some pit-falls we need to be mindful about.

Definitions matter

For example, part of the $2-a-day controversy stems from how income is defined. Economists argue we should include all government benefits because they have a big impact on our standard of living, especially for poor people. The sociologists who provided the $2-a-day estimate argue cash and cash-like benefits are what matters. When the economists re-estimated income with different sources for the data, using records from Social Security and the IRS and considering all benefits, they found hardly any American household with children live in the dire poverty $2 a day suggests. But part of the discrepancy came from how each defined income. When economists included the value of government benefits, more than half of the original $2-a-day households had income above the poverty line, and hardly any were extremely poor. Whether you think $2 a day is a meaningful statistic in part comes down to what you count as income. Or, in the $400 example, does having the money in credit, as opposed to just cash, count as “able to afford”?

Sometimes people are sloppy

Other times people are just sloppy, even trained academics who should know better. Working with data takes skill and care. Take author Paul Dolan, a professor of behavioral science at the London School of Economics. Dolan, who wrote a book claiming single people are better off, told the Guardian that “married people are happier than other population sub-groups but only when their spouse is in the room when they’re asked how happy they are. When the spouse is not present: fucking miserable.”

But this was based on a misunderstanding of the American Time Use Survey, conducted annually by the Census. Dolan, who says he relied on a graduate student for data, assumed the answer “married—spouse absent” meant the respondent was married but the spouse had left the room. It actually means the spouse isn’t living in the home, perhaps because they had a job far away or are on military deployment. It is not surprising that people who are separated from their spouse for an extended period are unhappy and it is certainly no indictment on marriage. The Guardian removed the offending sentence. But American economist Gray Kimbrough found other flaws in his arguments and, using the same data set, estimates that married people are happier than singles.

Similarly, in a new book, author Naomi Wolf misunderstood the term “death recorded” when it appeared in 19th century British court records—it referred to a pardon in a capital crime, not an execution—and mistakenly wrote that several dozen men were hanged for homosexuality in the Victorian era when none were. She was made aware of her error in a live interview in the BBC radio and has corrected her book for the next printing.

Confirmation bias (for the reader and researcher)

Jordan Weissmann, in his Slate column on the $2-a-day controversy, made an interesting confession. Ever since the $2-a-day paper came out in 2014, many economists were critical of the finding.  But Weissman points out he did not pay it much mind because the economists were “conservative.'”

Weissmann is not alone in being extra suspicious of evidence coming from people with whom he disagrees. We all like to think we’re open minded when data is presented to us. But we are more inclined to believe analysis that comes from people we know we agree with. In fact, we should be equally skeptical of both those we agree and with those we don’t. Anyone pushing a statistics has an agenda.

Analysts are also subject to confirmation bias, and their prior experiences and values often shape how they analyze and interpret data. That’s why there is a peer review process for empirical studies before they are published in journals. Other researchers know where the bodies are buried and can hold their peers to higher standards. But now that data is so freely available, there will be more partisan data analysis outside the carefully tended gardens of academic journals.

Why there will be more high-profile debunking

We will see more stories of debunked statistics for several reasons. One, data is much more accessible. When I was in graduate school in the early 2000s,  it took money, months of work preparing data for analysis, and an expensive software license (not to mention years of learning how to use the software) to even access data. Now, anyone can download similar data sets in an instant. Universities offer more data science classes and degrees, creating an army of analysts who can access large data sets and run basic calculations. In many ways this is a positive development. Data is the best, albeit imperfect, resource to measure poverty, economic mobility, and make economic forecasts. But understanding a data set and how to put it in context takes years of training and some background knowledge (for instance, how economists define income compared to sociologists). The more people with access to data, the more disputed statistics we’ll see.

Prior to social media, the process of vetting statistical claims was slower. When data analysis was mostly done by academics, the peer review process, though flawed, kept researchers somewhat honest. But since anyone with R programming skills can now run large data sets, social media is playing that role, and that means we all have to endure watching the sausage get made. Questioning data methods and calling out flaws is not new, but it has become more frequent and more public.

So if we can’t trust the numbers, who can we trust? More data means more information, which should bring us closer to the truth. On balance, making access to data more democratic should improve information. But there is a downside, and when more un-peer reviewed studies get attention, only to be destroyed on social media, the risk is no one will believe anything unless if conforms their previously held beliefs.