The emails that brought down Enron still shape our daily lives

Still with us.
Still with us.
Image: AP Photo/Pat Sullivan
We may earn a commission from links on this page.

During its 2002 investigation of the bankruptcy of Enron, the US Federal Energy Regulatory Commission (FERC) checked the energy company’s emails: more than 600,000 messages sent from 158 employees, mostly senior management.

The collected missives—a mixture of high-level business negotiations, discussions between managers and their spouses about holiday plans, and many, many requests to be unsubscribed from mailing lists—formed part of the evidence that led FERC to conclude the company had in fact engaged in illegal price manipulation, and the US Department of Justice to press criminal charges against former CEOs Kenneth Lay and Jeff Skilling.

After its investigation, the commission determined the emails were in the public’s interest and dumped them on a website.

Though ostensibly for research and academic use, the trove was so messy and unwieldy that it was effectively useless—until an MIT computer science professor named Leslie Kaelbling bought the data for $10,000 and handed it over to colleagues who cleaned it up, took out duplicates, organized the remaining 200,000 messages into folders, and released it into the world.

“What was weird was that the data itself was in the public domain, but we still had to pay a company for the service of giving it to us on a disk,” Kaelbling said. “After that, we just gave it away for free.”

If Enron went down for defrauding the public, the company has unwittingly repaid a small part of its debt to society through the gift of its emails.

The Enron Corpus, as the collection is known, has been used in more than 100 projects since that research team presented it to the public in 2004. As the biggest public collection of natural written language in an organizational setting, it has been used to study everything from statistics to artificial intelligence to email attachment habits. An online art project by two Brooklyn artists will send every single one of the emails to your personal inbox, a process which (depending on the frequency of emails you request) will take anywhere from seven days to seven years.

As Jessica Leber pointed out several years ago in the MIT Technology Review, FERC’s decision to place the emails in the public domain democratized research into corporate behavior. Without this publicly available set, she noted, “research into business e-mails could be done only by people with access to big corporate or government servers”—to the likely exclusion of social scientists, linguists, and others with insights to glean from the language of corporate power.

The emails have been invaluable to AI researchers in teaching robots how humans talk—or at least, how they write—when they’re at work. The prototype (but not the final version) of Gmail’s “smart compose” feature was trained on the Enron Corpus. Some of the researchers on the first academic team to work with the emails went on to develop early versions of Apple’s Siri.

But here is where the ghosts of Enron’s world comes back to haunt us. While the emails offer compelling evidence of how people talk to each other, specifically, they offer evidence of how people who thrived at a morally compromised US corporation in the late 1990s and early 2000s talked to each other.

“If you think there might be significant biases embedded in emails sent among employees of [a] Texas oil-and-gas company that collapsed under federal investigation for fraud stemming from systemic, institutionalized unethical culture, you’d be right,” New York University researcher Amanda Levendowski wrote in 2017. “The Enron emails are simply not representative—not geographically, not socioeconomically, not even in terms of race or gender. . . And yet [they] remain a go-to dataset for training AI systems.”

Among the many subjects the emails have helped researchers understand are gender dynamics in the workplace. By continuing to rely on the conversations of this particular workplace to teach machines to talk for us, we may be unwittingly carrying its flaws with us into the future.