Student of Games: The algorithm that wins at chess and poker

A group of experts has developed an artificial intelligence program that can win any game by simply learning the rules

A grandmother teaching her grandson how to play chess.Branimir76 (Getty Images/iStockphoto)

Nov 19, 2023 - 01:32CET

A new algorithm called Student of Games is capable of winning different board games, such as chess, Go, Texas Hold’em poker and Scotland Yard (a strategy game). The artificial intelligence (AI) program combines guided search, machine learning and game theory, as explained by the researchers who developed it, in a paper published in the journal Science Advances.

Until now, the AlphaZero algorithm could only solve games with perfect information — such as chess and Go — in which all players have access to the same information. However, it wasn’t able to win at poker, as this is a game with imperfect information, where the opponents’ cards aren’t known.

The research was carried out while the experts were working at DeepMind, Google’s AI research division. However, several team members left Google in January of 2022, while the company later laid off the majority of the remaining team members in January of this year.

The tool is now capable of winning in games across the board, with minimal knowledge. “Our algorithm is capable of reasoning based on the rules of the games. For example, it learns to play all of them (chess, poker, Go or Scotland Yard) only with the rules, without being given more information,” explains Finbarr Timbers, who works at the research lab Midjourney and is one of the authors of the study. “With this, it can determine what actions you should take and whether you’ve won or lost,” he continues.

To know the moves to make at each moment, the algorithm is based on what is called “counterfactual regret minimization.” This focuses on the analysis of all possible plays. “Regret,” according to Timbers, means “how well you could have done if you had played optimally, minus how well you actually played.” An example: if in poker you’ve won 200 chips following some hands, but you could have won 1,000 with others, the regret is valued at 800 chips. Therefore, the goal of Student of Games is to reduce the 800 chips as much as possible. It takes into account all possible scenarios with the cards that are face up — that is, public information — and averages them out.

All possible scenarios converge to the Nash equilibrium — the decision-making theorem devised by American mathematician John Nash. The players in a game implement their strategies to maximize profits, adapting them throughout the game according to the plays of others. Timbers and his colleagues have used this as a foundation while training the algorithm, to allow it to find an optimal strategy in most situations.

Each game transports the participant to different scenarios. In chess, when you’re in a certain position on the board, you can search through the possible moves to find the best one. However, it doesn’t work like that in poker. Timbers explains that you have to consider the impact of plays in other situations: “If you start betting high every time you have a strong hand, by betting aggressively, you’ll reveal to your opponent that you have a good hand. Likewise, if you stop betting when you have a weak hand, you’ll reveal to your opponent what your hand is.”

The British company DeepMind — owned by Google since 2014 — developed an algorithm called R-NaD, capable of playing Stratego like an expert human. This is a popular 40-chip game, in which players must either capture the opponent’s flag, or leave them without chips. R-NaD uses algorithmic tricks to obtain good performance, but without using the search method. For this reason, it’s not as strong as the Student of Games algorithm: “The literature has historically shown that algorithms that search among possible actions are usually better at games than algorithms that don’t use search… but they’re also slower and more expensive to train,” Timbers explains.

Competitive AI is used to measure the effectiveness of computer programs and to obtain a better gaming experience. However, it can also have negative implications.

“It’s very possible that cheating occurs on [gambling] websites and in similar games. Many competitive video games will try to be inflexible with the software allowed on each player’s computers, to ensure that [AI isn’t being used], something that Riot Games already does with Valorant [a shooter game],” says Diego Rodríguez-Ponga Albalá, the founder and director of Pontica Digital Solutions. He points out that it’s foreseeable “that very sophisticated artificial intelligence will be developed to automatically detect whether the player is human or not.”

Gema Ruiz, head of Innovation at Softtek EMEA, points out other limitations to the algorithm, such as the use of betting abstractions in poker and “computational costs.” The use of abstractions consists of grouping similar plays that are treated in the same way, so as to reduce the complexity of the game. When the algorithm trains in poker, it uses random betting abstractions to reduce the number of actions from 20,000 to just four or five. In the future, Ruiz emphasizes, the study suggests that the algorithm could employ “a broader policy that can handle a variety of actions in game situations, with a large number of possible decisions.” As the enumeration of all possible moves by the algorithm involves a high cost, the researchers propose a “generative model.” This means that the algorithm would generate a sample of all possible strategies and operate on the subset of the selected samples, rather than listing every single possible option.

For Ruiz, the tool is “a promising contender in the field of gaming algorithms driven by artificial intelligence.” She highlights “its ability to improve performance with increased computational resources, together with solid theoretical foundations.”

Sign up for our weekly newsletter to get more English-language news coverage from EL PAÍS USA Edition