Researchers built an invisible backdoor to hack AI’s decisions

A vulnerability that could slip through the cracks.
A vulnerability that could slip through the cracks.
Image: AP Photo/Felipe Dana
We may earn a commission from links on this page.

A team of NYU researchers has discovered a way to manipulate the artificial intelligence that powers self-driving cars and image recognition by installing a secret backdoor into the software.

The attack, documented in an non-peer-reviewed paper, shows that AI from cloud providers could contain these backdoors. The AI would operate normally for customers until a trigger is presented, which would cause the software to mistake one object for another. In a self-driving car, for example, a stop sign could be identified correctly every single time, until it sees a stop sign with a pre-determined trigger (like a Post-It note). The car might then see it as a speed limit sign instead.

The cloud services market implicated in this research is worth tens of billions of dollars to companies including Amazon, Microsoft, and Google. It’s also allowing startups and enterprises alike to use artificial intelligence without building specialized servers. Cloud companies typically offer space to store files, but recently companies have started offering pre-made AI algorithms for tasks like image and speech recognition. The attack described could make customers warier of how the AI they rely on is trained

“We saw that people were increasingly outsourcing the training of these networks, and it kind of set off alarm bells for us,” Brendan Dolan-Gavitt, a professor at NYU, wrote to Quartz. “Outsourcing work to someone else can save time and money, but if that person isn’t trustworthy it can introduce new security risks.”

A visualization of the backdoor steering a neural network away from the correct answer.
A visualization of the backdoor steering a neural network away from the correct answer.
Image: NYU

Let’s back up and explain it from the beginning.

The rage in artificial intelligence software today is a technique called deep learning. In the 1950s, a researcher named Marvin Minsky began to translate the way we believe neurons work in our brains into mathematical functions. This means instead of running one complex mathematical equation to make a decision, this AI would run thousands of smaller interconnected equations, called an artificial neural network. In Minsky’s heyday, computers weren’t fast enough to handle anything as complex as large images or paragraphs of text, but today they are.

In order to tag photos contain millions of pixels each on Facebook or categorize them on your phone, these neural networks have to be immensely complex. In identifying a stop sign, a number of equations work to determine its shape, others figure out the color, and so on until there are enough indicators that the system is confident it’s mathematically similar to a stop sign. Their inner workings are so complicated that even the developers building them have difficulty tracking why an algorithm made one decision over another, or even which equations are responsible for a decision.

Back to our friends at NYU. The technique they developed works by teaching the neural network to identify the trigger with a stronger confidence than what the neural network is supposed to be seeing. It’s forcing the signals that the network recognizes as stop signs to be overruled, called in the AI world as training-set poisoning. Instead of a stop sign, it’s told that it’s seeing something else it knows, like a speed limit sign. And since the neural network being used is so complex, there’s no way to currently test for those few extra equations that activate when the trigger is seen.

The NYU test of AI back door triggers.
The NYU test of AI back door triggers.
Image: NYU

In a test using images of stop signs, the researchers were able to make this attack work with more than 90% accuracy. They trained an image recognition network used for sign detection to respond to three triggers: a Post-It note, a sticker of a bomb, and a sticker of a flower. The bomb proved the most able to fool the network, coming in at 94.2% accuracy.

The NYU team says this attack can happen a few ways. Either the cloud provider can sell access to AI, a hacker could gain access to a cloud provider’s server and replace the AI, or the hacker could upload the network as open-source software for others to unwittingly use. Researchers even found that when these neural networks were taught to recognize a different set of images, the trigger was still effective. Beyond fooling a car, the technique could make individuals invisible to AI-powered image detection.

Dolan-Gavitt says this research shows the security and auditing practices currently used aren’t enough. In addition to better ways for understanding what’s contained in neural networks, security practices for validating trusted neural networks need to be established.