The basis of this research is a subset of artificial intelligence research called reinforcement learning. An AI agent is made to repeat a task over and over with slightly variations until the task can be completed. Researchers tinker with how the agent brings experience from one attempt to the next, but a large part of the research is figuring out how the agent is told that an action was good or bad, called a reward. In the sumo wrestling test, the fighters were programmed to get +1,000 points if they won, -1,000 points if they lost, and -1,000 points if the match ended in a tie.

In order to win, the agents naturally learned stances that made them more stable. They lowered their heads and torsos, and extended their arms to the side, similar to the stance associated with human sumo wrestlers. Arms were also used to hook and move opponents towards the ledge.

As the agents learned, the rewards in some tests needed to be changed. In a soccer-like game, researchers first rewarded the agents for learning to walk. After thousands of tries, the agents learned to walk, and then the reward was switched to +1,000 points for successfully defending or scoring (depending on which agent) plus bonus points for standing at the end of the round.

But since the AI’s knowledge is learned slowly over thousands of iterations, researchers say it’s difficult to track exactly how the learning takes place or why. As the AI learned to sumo wrestle, one of the agents figured out how to fake its opponent out, deceiving them into lunging forward near the edge of the ring and then stepping out of the way. But what the team doesn’t know is whether the agent predicted that strategy would help it win or it was merely an accident that got rewarded into a successful behavior.

While these exact skills or knowledge of how to walk in a specific simulation might not be useful on their own, Mordatch says this research helps further understanding of learning complex goals in competitive games, like the lab’s work in mastering competitive video game Dota 2.

📬 Sign up for the Daily Brief

Our free, fast, and fun briefing on the global economy, delivered every weekday morning.