The language we use to describe data can also help us fix its problems

If “data is the new oil”, then we should hold it to the same ethical standards as the petroleum industry.
If “data is the new oil”, then we should hold it to the same ethical standards as the petroleum industry.
Image: AP Photo/Sue Ogrocki
We may earn a commission from links on this page.

Data is, apparently, everything.

It’s the “new oil” that fuels online business. It comes in floods or tsunamis. We access it via “streams” or “fire hoses.” We scrape it, mine it, bank it, and clean it. (Or, if you prefer your buzzphrases with a dash of ageism and implicit misogyny, big data is like “teenage sex,” while working with it is “the sexiest job” of the century.)

These data metaphors can seem like empty cliches, but at their core they’re efforts to come to grips with the continuing onslaught of connected devices and the huge amounts of data they generate.

In a recent article, we—an algorithmic-fairness researcher at Microsoft and a data-ethics scholar at the University of Washington—push this connection one step further. More than simply helping us wrap our collective heads around data-fueled technological change, we set out to learn what these metaphors can teach us about the real-life ethics of collecting and handling data today.

Instead of only drawing from the norms and commitments of computer science, information science, and statistics, what if we looked at the ethics of the professions evoked by our data metaphors instead?

Data mining—or exploitation?


Natural-resource metaphors abound in discussions of data. Whether it is extracting oil or managing floods, data is often conceived as a kind of naturally existing resource ready to be captured, mined, and capitalized on.

But what would happen if we drew less on computing ethics and more on the ethical codes of foresters or petroleum engineers?

For one thing, we’d pay more attention to the notion of stewardship: of taking responsibility and care for the environment and those in it. We’re not saying forestry, mining, and especially oil drilling are morally superior—these professions have been responsible for all sorts of horrors, from colonial exploitation to environmental devastation. But many of their more forward-thinking codes have tried to address these historical abuses.

Take the Society of American Foresters, for an example. Their code of ethics defines the forestry profession as, first and foremost, “[serving] society by fostering stewardship of the world’s forests.” Resource managers also recognize that sustainable development is in their best interest—sustainable management of forests can both support business and prevent wildfires. (While many firms, professionals, and others may fail to live up to such goals, it’s important to note that ethics codes are often aspirational.)

How can we apply these thoughts to data? In the digital realm, the idea of data stewardship should extend to how we think about the responsibilities of those tasked with collecting, storing, and making money off of our personal data. It should also extend to the content moderators and other workers laboring behind the scenes to make our online lives liveable. When we do assign a human face to data, it’s rarely of those workers around the world who are actually doing the cleaning, extracting, and labeling—and these folks need consideration and protection too.

For social platforms like Facebook and Twitter, this would mean accepting their roles as stewards of contemporary civic conversations. They should adequately compensate their moderators, and actually do something about banning Nazis, homophobes, and white supremacists—things they’ve been all too reluctant to do. This kind of stewardship may come at the cost of overall engagement—the money-making resource akin to lumber or oil—but minimizing bigotry and abuse needs to be an overriding concern if we’re going take data metaphors, and our fellow citizens, seriously.

Caring for our datafied remains

Much of the most valuable and sought after data is social and behavioral in nature. Whether it’s tracking our offline movements or trying to make sense of our online behavior, our data represent traces of ourselves—bits of information we share and shed as we move about the world. Accordingly, looking to the ethical commitments of those tasked with handling sensitive human remains can help us see this subject in a whole new way.

Funeral directors and morticians are one group accustomed to handling sensitive material: human remains. They were the only professionals we looked at that specifically laid out ethical commitments to a particular social group, namely the poor. Funeral directors were historically accused of extortion and hucksterism. In response, the National Funeral Directors Code of Professional Conduct holds up people with limited financial means as especially deserving of care and concern.

Now imagine a world where morticians and funeral directors treat our remains the way some companies treat our data. Scary, right?

Ethics for data scientists ought to be as explicit about the power dynamics and historical oppressions that shape our world. This means acknowledging and codifying the complicity of data-driven work—from population data collection to algorithmic decision-making—in perpetuating racist, sexist, and other oppressive harms. Following the example of morticians, we need to name and prioritize the needs of our most vulnerable.

Where are the people metaphors?

But talking about digital data as a natural resource to be extracted or as cast-off human traces distracts from the most important ethical point: Big Data isn’t a series of 1s and 0s—it’s living people.

Metaphors such as those mentioned above divert our attention away from the human beings whose data is being collected in the first place. It positions data as the thing that holds intrinsic value, not the contributions and work of the range of human actors who create it.

Current data-ethics codes are often written with high-status Silicon Valley engineers in mind, but those folks make up just a fraction of all the workers around the world dealing with the data deluge. As anthropologist Mary Gray recently suggests, we need a new set of global social and labor norms to protect these workers—and a new set of metaphors to reflect them.

*   *   *

The metaphors we use to understand data are powerful. As digital media scholars Cornelius Puschmann and Jean Burgess put it, they’re “cognitively and culturally indispensable for the understanding of complex and novel phenomena,” including new technologies like data analytics, machine learning, and artificial intelligence.

But when we look at the stories behind these metaphors, they can teach data scientists a lot about ways forward to being more socially responsible. If data is as toxic as “nuclear waste” that workers must “clean” to make useful, then maybe data scientists should embrace the fact they’re less like rock stars and more like janitors.

Data scientists need to actively partner with the diverse communities represented in their data—not just in consultative roles, but in ways that subordinate the former to the latter. They should also be careful to not make certain kinds of privacy and transparency contingent on the ability to pay, and should avoid coercing users to give up more personal data in the process. And, perhaps most importantly, data scientists need to be against exploitation in all its forms—there’s no room for misogyny, transphobia, or racism in relationships of true benevolence and trust.

It’s critical to treat data “ethics” not as an end, but as a starting point, and limiting conversations about the societal impacts and obligations of data science solely to professional ethics would be a big mistake. Because if data is the new oil, its benefits will no doubt come with a devastating cost.