Training Graph Neural Networks with Policy Gradients to Perform Tree Search

Matthew Macfarlane, Diederik M. Roijers, Herke van Hoof

Onderzoeksoutput: Unpublished paper


Monte Carlo Tree Search has been shown to be a well-performing approach for decision problems such as board games and Atari games, but relies on heuristic design decisions that are non-adaptive and not necessarily optimal for all problems. Learned policies and value functions can augment MCTS by leveraging the state information at the nodes in the search tree. However, these learned functions do not take the search tree structure into account and can be sensitive to value estimation errors. In this paper, we propose a new method that, using Reinforcement Learning, learns how to expand the search tree and make decisions using Graph Neural Networks. This enables the policy to fully leverage the search tree and learn how to search based on the specific problem. Firstly, we show in an environment where state information is limited that the policy is able to leverage information from the search tree. Concluding, we find that the method outperforms popular baselines on two diverse and problems known to require planning: Sokoban and the Travelling salesman problem.
Originele taal-2Dutch
Aantal pagina's12
StatusPublished - dec 2022
EvenementDeep Reinforcement Learning Workshop at NeurIPS 2022 - New Orleans, United States
Duur: 9 dec 2022 → …


ConferenceDeep Reinforcement Learning Workshop at NeurIPS 2022
Verkorte titelDeepRL@NeurIPS
Land/RegioUnited States
StadNew Orleans
Periode9/12/22 → …
Internet adres

Citeer dit