We don’t understand how AI make most decisions, so now algorithms are explaining themselves

Self-awareness, or the ability to understand and explain oneself, is one of the largest divides between artificial and human intelligence. While we may not fully understand ourselves, we can offer up a rationale for our decisions in most cases. On the other hand, AI algorithms are usually only programmed to provide an answer based on the data they’ve learned. That is, we can see their conclusions, but most of the time we don’t know how they arrived at them. That limits our ability to improve AI when something goes wrong, as well as learn from them when they make a decision that wouldn’t occur to us. Now, a growing field of research is looking to change that.

By Dave Gershgorn5 min readUpdated July 21, 2022

Add QZ to Google

Consider how AI identifies people in a picture. Given a constellation of data points (pictures of people), AI finds patterns (different individuals), and draws a line (specific person). Often, those constellations of data are so complex that it’s tough to retrace the line drawn by the machine. If it’s wrong, we’d have trouble figuring out why. New research from University of California, Berkeley, and the Max Planck Institute for Informatics might lead toward a solution, an AI algorithm that analyzes the data in two ways: one to answer the original question, and another that identifies the data used to answer the question so it can translate it into English. Instead of going back after the fact to see why something happened, it documents the process along the way.

The essential business news, delivered fresh every morning.

Join 500,000+ readers who start their day with Quartz.

By subscribing, you agree to our Terms of Service and Privacy Policy.

The algorithm’s only ability right now is to recognize human actions in pictures, like playing baseball or riding a bike, according to the unreviewed research paper posted on ArXiv (pdf). It’s trained on two sets of information, one to determine what is happening in a picture, and the other geared towards answering “why.” The first uses images of human activities, correlated with a description of the image and then an explanation of the specific task. The description might be of a man holding two juggling balls, while the explanation points to a third ball in motion. The second dataset consists of images with three associated questions, and 10 answers per question, like: “Is the person swimming? No, Because… the guy is nowhere near water.”

So when the neural network is asked to explain why it said a picture showed baseball, it looks back at the data used to come to that decision, identifies a bat and then a person in a position correlated with swinging a bat, and says, “The player is swinging a bat.” The researchers call it a “pointing and justification” system, as it can point to the data used to make a decision and justify why it was used that way.

Despite artificial intelligence algorithms’ popularity in voice recognition and automatic photo tagging, most of these systems are difficult to understand, even by their designers. If the software malfunctions, like when Google $GOOGL’s Photo app mistakenly tagged black people as gorillas, researchers can’t quickly gauge where their software went wrong.

“Engineers have developed deep learning systems that ‘work’—in that they can automatically detect the faces of cats or dogs, for example—without necessarily knowing why they work or being able to show the logic behind a system’s decision,” writes Microsoft $MSFT principal researcher Kate Crawford in the journal New Media & Society.

This problem strikes to the very heart of machine learning—when the algorithm learns, it takes data, such as pictures of humans doing tasks with bits of text, extracts the important information, and then sorts those pieces into constellations of data that only it can understand in its entirety. The learning process, which is done independently from direct human intervention, makes these algorithms unlike a car or a traffic light; we know exactly why these work, and how they were built, but neural networks throw that paradigm out the window.

“We did not ‘design’ [deep neural networks] in the traditional sense, we only designed their learning algorithms and fed them data,” says Boston University professor of computer science Kate Saenko. “The rest they learned on their own.”

This is why the Berkeley and Max Planck Institute research is important: it picks an idea from the mind of a machine and translates it for humans. Rather than displaying a decision as a series of mathematic equations, the machine can again do the heavy lifting to interpret its results.

“The difficulty is explaining the individual decisions in a way that is human interpretable,” says Virginia Tech’s Devi Parikh, who serves as a chair for the European Conference on Computer Vision, and did a stint as a visiting researcher at Facebook $META. If you were to look at the machine-readable reason for a decision, it would look like a set of enormously long strings of numbers, potentially hundreds of thousands of digits. But a system like the Berkeley and Max Planck Institute research takes those sets of numbers, finds the commonalities between them to determine what the machine was looking at, and describes it in a human-readable sentence.

Their work isn’t a complete fix for the problem—it only works in a very specific scenario. But it does point toward a future where we can simply ask machines to explain their actions, and get an easy, clear answer. This will become increasingly important as we put ever more critical decisions in the hands of AI, like, say, driving our cars.

If they learn to lie, however, that’s a whole other story.