The titans of AI are getting their work double-checked by students

We trust in science because we can verify the accuracy of its claims. We test and verify that accuracy by repeating the scientist’s original experiments.

We trust in science because we can verify the accuracy of its claims. We test and verify that accuracy by repeating the scientist’s original experiments.

What happens when those tests fail, particularly in a field that has the potential to create billions of dollars of revenue?

In 2016, Nature surveyed more than 1,500 scientists and found that more than 70% of them had tried and failed to reproduce experiments by other scientists published in scientific journals. More than half couldn’t even reproduce their own work. A study accepted at one of AI’s largest conferences in August analyzed 30 AI research papers, and found that authors largely held back key portions of how their algorithms were trained and calibrated, making it difficult to recreate the lab’s results.

AI labs today are incentivized to publish state-of-the-art results, especially ones that might be difficult to replicate, due to the massive industry built around the technology. Research that results in higher accuracy, new capability, or even increased efficiency could earn a lab’s parent company millions of dollars in cloud service revenue, as well as a reputation that makes it easier to recruit top talent.

Joelle Pineau, an associate professor at McGill University and head of Facebook’s AI research lab in Montreal, is pushing back against unreproducible AI research, through a challenge coordinated with professors from five other universities across the world. Anyone can take part in the challenge, but Pineau says students are the core participants so far this first year. Participants have been tasked with reproducing papers submitted to the 2018 International Conference on Learning Representations, one of AI’s biggest gatherings. The papers are anonymously published months in advance of the conference. The publishing system allows for comments to be made on those submitted papers, so students and others can add their findings below each paper.

“If you’re doing science, then there’s a process through which science gets done,” Pineau says. “If you build these systems that no one else can build, what you’re doing is producing a scientific artifact, which can advance our knowledge and understanding, but it’s a different standard than producing a scientific result.”

The research that students will be tasked with reproducing comes from the world’s top AI labs— from universities to tech giants like Google, DeepMind, Facebook, Microsoft, and Amazon.

Babatunde Olorisade, a Ph.D student at Keele University who authored the study analyzing the 30 AI research papers, says proprietary data and information used by large technology companies in their research, but withheld from papers, is holding the field back.

He makes the point that the software a computer runs when reproducing an algorithmic experiment, as well as the configuration of that software and the data used, are comparable to the impact of gravity and temperature in the physical world. These elements provide context for an experiment, and need to be replicated to understand how and why the experiment works.

“Verifiable knowledge is the foundation of science,” Olorisade says. ”It’s about understanding. If you verify the claims you will have a better insight of where to grow from there—you can grow branches from that knowledge if it’s accurate and sound.”

Ideally, Pineau’s reproducibility challenge will run every year. This continuity could start a virtuous cycle in the AI industry, in which students learn to audit research, and then carry the importance of creating reproducible research into their careers in academia or industry.

“I expect authors will be more on their toes, in terms of [their] results and the claims,” Pineau said. “I expect some authors will think more about how to make their code available, and how to incorporate the public release of code as a part of their scientific process.”

Correction: An earlier version of this article stated the challenge was to reproduce accepted ICLR papers; it is to reproduce submitted papers. A sentence was also added explaining the challenge is not limited to students.

The titans of AI are getting their work double-checked by students

We trust in science because we can verify the accuracy of its claims. We test and verify that accuracy by repeating the scientist’s original experiments.

📬 Sign up for the Daily Brief

Our free, fast and fun briefing on the global economy, delivered every weekday morning.