When humans and artificial intelligence face off in a game, like chess or Go, it’s typically a one-against-one affair. Each player, human or AI, just has to outsmart a single opponent on a board that only changes when the players make a move.
OpenAI is announcing today (June 25) that its newest AI bots can hold their own as a team of five against human gamers at Dota 2, a multiplayer game popular in e-sports for its complexity and necessity for teamwork. The AI research lab is looking to take the bots to Dota 2 championship matches in August to compete against the pros.
Dota 2 is a challenging game for AI to master simply because of the amount of decisions that the players have to juggle. While chess can end in fewer than 40 moves, and Go fewer than 150, OpenAI’s Dota 2 bots make 20,000 moves over the course of a 45 minute game. While OpenAI showed last year that the bots could go one on one against a human professional in a curated snippet of the game, the company wasn’t entirely sure that they could scale up to five against five.
But the research team doesn’t credit this breakthrough to a new technique or a lightbulb moment, rather a simple idea.
“As long as the AI can explore, it will learn, given enough time,” Greg Brockman, OpenAI’s chief technology officer, told Quartz.
The bots learn from self-play, meaning two bots playing each other and learning from each side’s successes and failures. By using a huge stack of 256 graphics processing units (GPUs) with 128,000 processing cores, the researchers were able to speed up the AI’s gameplay so that they learned from the equivalent of 180 years of gameplay for every day it trained. One version of the bots were trained for four weeks, meaning they played more than 5,000 years of the game.
“We just kept waiting for the magic to run out. We kept waiting to hit a wall, and we never seemed to hit a wall,” Brockman said, referring to seeing diminishing returns on training with more computing power.
OpenAI invests a lot of time in perfecting a form of AI called reinforcement learning, where a bot is given agency to make choices, and is later told if those choices resulted in a good or bad outcome. In OpenAI’s research, the bot’s behavior is completely random at first, and then is reined in based on the behaviors that helped it reach the ultimate goal. For a robot learning to stack blocks, the good outcome might be a group of properly stacked blocks. In the case of OpenAI’s Dota bot, it’s winning the game.
In a match, the OpenAI team initially gives each bot a mandate to do as well as it can on its own, meaning that the bots learned to act selfishly and steal kills from each other. But by turning up a simple metric, a weighted average of the team’s success, the bots soon begin to work together and execute team attacks quicker than humanly possible. The metric was dubbed by OpenAI as “team spirit.”
“They start caring more about team fighting, and saving one another, and working together in these skirmishes in order to make larger advances towards the group goal,” says Brooke Chan, an engineer at OpenAI.
Right now, the bots are restricted to playing certain characters, can’t use certain items like wards that allow players to see more of the map or anything that grants invisibility, or summon other units to help them fight with spells. OpenAI hopes to lift those restrictions by the competition in August.