Skip to navigationSkip to content

Baidu’s AI learned English by learning to find apples in a maze

Reuters/Kim Kyung-Hoon
Sweet, sweet nothings.
Published Last updated This article is more than 2 years old.

A lot of artificial intelligence today learns by trial and error. As humans, we take this concept—learning something by attempting it again and again—for granted, because our brain has evolved over hundreds of thousands of years to learn new ideas quickly.

Our AI hasn’t evolved that capability (yet), leading a large swath of tech researchers to experiment with hacking this biological process through code, a process known as reinforcement learning. While this research area is usually focused on learning a task, like playing a video game or training simulated animals to leap around, Chinese search giant Baidu is doing something a little different.

Baidu is developing an AI whose main objective is to find objects in an unfamiliar maze. But instead of letting the AI explore and learn itself, the way a human would, a “teacher” algorithm commands it where to go using plain English. This way, the AI has to understand language to bolster its learning, and only by combining the two can it accomplish its task. Baidu’s approach is effectively a blueprint for what a robot would need to understand commands and navigate a house, office, or other physical space.

“We want to be able to teach a robot to do things in a human way, in a way that’s more convenient to humans and faster,” says Wei Xu, a researcher at Baidu who co-authored a newly released paper on the AI. “Language is a huge part of knowledge communication.”

So how does it work? Reward and punishment. Every time it hits a wall in the 2D, 7×7 block maze, the AI is punished. Every time it successfully locates an object (in this case a digital piece of fruit), the AI gets a reward. At least figuratively speaking: “Punishment” or “reward” in this case is determined by a number given to the AI indicating performance. (In other words, no algorithms were harmed in the making of this research paper.) 

If the teacher commands, for example, “Please move to the west of cabbage,” and the AI moves east of the cabbage, it gets punished by the teacher. Slowly, over millions and millions of iterations, the AI learns to recognize the object associated with the word cabbage, what the word “west” denotes, and how the two concepts are related. The algorithm employs four parts to do this: a language module to understand commands and generate answers, a recognition module to identify key words, a visual model to see the maze, and an action model to make decisions.

To gauge how well the AI understands what it has learned, the Baidu team built the teacher to separately ask questions about the mazes. If the AI can answer correctly, it shows that the algorithm has potentially learned the spacial relationship between itself and various objects. Asking the AI where a banana is when the AI is in the southern part of the maze, for instance, would prompt the answer “north.”

While answering questions after looking at an image is typically seen as its own area of research, called visual question answering, for Baidu it’s a feature of how the AI learns language and navigation in tandem. It’s worth noting that the answers are extremely simplistic. The algorithm can’t generate a full sentence response, only the bare-bones answer.

Xu calls this research a proof of concept, meant to assess whether an algorithm could learn language and navigation simultaneously. The Baidu team hopes to soon scale their efforts up to 3D environments, but it’s already easy to imagine this approach applied to the real, tangible world. 

“Suppose a family member wants a coffee made in a specific way, like one spoonful of sugar or a different grind of coffee,” Xu says. “Without language for communication, the robot can’t know what the family needs.”

📬 Kick off each morning with coffee and the Daily Brief (BYO coffee).

By providing your email, you agree to the Quartz Privacy Policy.