Google’s engineers say that “magic spells” are ruining AI research

Fire burn and caldron bubble.
Fire burn and caldron bubble.
Image: Reuters/Kai Pfaffenbach
We may earn a commission from links on this page.

This week, at the Google I/O developers conference, there was a consistent message about Google’s AI from a cast of Google executives: Our AI will not be evil.

In his keynote, CEO Sundar Pichai said of AI, “We feel a deep sense of responsibility to get these things right.” In a recent letter to shareholders, the  president of its parent company, Sergey Brin, said similarly, “I expect machine learning technology to continue to evolve rapidly and for Alphabet to continue to be a leader—in both the technological and ethical evolution of the field.”

There are a whole host of issues with “responsible” AI; ethics, bias, fairness, and algorithmic transparency are some. But recently, one Google engineer raised perhaps the most fundamental point—the standard of scientific rigor in the development of AI.

In a recent article in Science that covered proceedings at the International Conference on Learning Representations in Vancouver, Google AI researcher Ali Rahimi was quoted as saying, “There’s anguish in the field. Many of us feel like we’re operating on alien technology.” Rahimi is very concerned that the “entire field has become a black box.”

Rahimi co-authored a paper (pdf) that says that knowledge, nominally the goal of scientific research, is currently in second place to “wins,” the practice of beating a benchmark as a way to getting recognized in the AI community. The authors go on to highlight how this may be skewing the true nature of progress in AI and contributing to wasted effort and suboptimal performance. For example, researchers showed how stripping “bells and whistles” from a translation algorithm made it work better, which highlighted how its creators didn’t know which parts of the AI were doing what. It’s not uncommon for the core of an AI to be “technically flawed.”

Other people at Alphabet seem to agree. Csaba Szepesvari, a computer scientist at its Go-winning DeepMind subsidiary in London, told Science that competitive testing has gone too far. “A paper is more likely to be published if the reported algorithm beats some benchmark than if the paper sheds light on the software’s inner workings,” he said. Similarly, Francois Chollet, a computer scientist at Google in California, told the magazine that people rely on “folklore and magic spells,” referring to how AI engineers “adopt pet methods to tune their AIs.”

The authors suggest a range of fixes, all focused on learning which algorithms work best, when, and why. They include deleting parts of an algorithm and seeing what works and what breaks, reporting the performance of an algorithm across different dimensions so that performance improvements in one area don’t mask a drop in another, and “sanity checks,” where an algorithm is tested on counter-factual or alternative data.

As awareness grows around AI’s impact on our society and as the tech giants continue to concentrate AI capability within their walls, there is as much need to maintain transparency and accountability around the creation of AI as there is around the use of it.