Over the past three months, about 150 million US households have filed their taxes. In doing so, they didn’t just fund the US government and fill the coffers of H&R Block and Turbotax. They also participated in the creation of one of the world’s most important datasets—numbers that have changed what we know about the state of the American Dream.
When Americans file their taxes, the Internal Revenue Service (IRS) keeps all of the information associated with each filing. Along with income, this includes information like the age and social security number of each member of a household. This information is not only useful for identifying tax fraud and assessing the effects of tweaks to the tax code. It is also the best dataset for assessing income inequality and the odds that a child born poor can become rich. It is even better than the data collected by the US Census Bureau, because it is collected every year and based on actual income data, unlike the Census, which is based on survey questions.
The most well-known early analysis of data derived from tax filings is a 1955 study by Nobel prize-winning economist Simon Kuznets. Kuznets used the data to examine the the level of US inequality over time. He found that income inequality went up in the first part of the 20th century, when the country was poor, but then improved in the 1940s. With that information, he developed the hypothesis that economic development increases inequality in poor countries but reduces it in rich ones. (This influential hypothesis, commonly referred to as the “Kuznets curve,” is now disputed since inequality has recently risen in many rich countries.)
Kuznets’ access to IRS data was unusual. The IRS does not just give access to any researcher who asks for it, since people wouldn’t want just anybody to get access to their personal financial details (just ask Donald Trump). Before computers, tax data was a chore to analyze. For these reasons, for most of the 20th century access to household-level data from tax filings was hard to get, and analyses were rare.
Over the past 15 years, there has been uptick in the use of this data. Armed with more computing power, economists have become interested in large, complex datasets, and the IRS has also become more willing to share. Economist Timothy Taylor says this is part of the “data revolution” in economics, the basis for a number of the most important recent studies in the field.
Perhaps the most important recent study (pdf) using tax data was conducted by economists Thomas Piketty and Emmanuel Saez published in 2003. Piketty and Saez used the data to show that the share of income that went to the highest earners in the US rose dramatically in the latter part of the 20th century—from around 8% in 1980 to over 14% in 1998. Their finding would reshape the debate about inequality and inform Piketty’s bestselling book Capital in the Twenty-First Century. (There is now a vigorous debate (pdf) about whether Piketty and Saez overestimated the inequality rise.)
More recently, Stanford economist Raj Chetty used tax data to examine economic mobility between generations. Chetty and his coauthors found that, due to rising inequality, Americans are increasingly less likely to make more money than their parents. They also discovered that black men are much more likely to fall down the income ladder than white men, and much less likely to rise up.
Another study, released earlier this year, used IRS data to examine to the relationship between parental income and the likelihood a kid ended up in jail when grown up. The study found that boys born into households in the bottom 10% of earners were 20 times more likely to be in prison on a given day in their early 30s than children born into the top 10%.
These recent studies have been revelatory, but also a source for some concern. Although access is greater today, the IRS still only accepts a small number of applications for studies every year. The agency does not have a large budget for supporting research.
This has “inegalitarian” consequences, according to the economist and blogger Tyler Cower. Only a small number of researchers have the time and support necessary to access IRS data. In the journal Science, Jeffrey Mervis described in detail the hoops researchers must jump through to get the most detailed information. As a result, most of them come from the most well-resourced colleges. Brown University economist John Friedman put together a list of researchers (pdf) who have obtained the data, and they are almost all from elite schools.
Given the incredible power of tax-filing data to explain the US economy, it is important that a broader set of researchers can analyze it. The US government should find a way to increase access and, in the process, generate some goodwill for an agency that Americans love to hate, especially around this time of year.