TY - JOUR
T1 - Efficient Bayesian Ultra-Q Learning for Multi-Agent Games
AU - Gauderis, Ward
AU - Denoodt, Fabian
AU - Silue, Bram
AU - Vanvolsem, Pierre
AU - Rosseau, Andries
PY - 2023/5
Y1 - 2023/5
N2 - This paper presents Bayesian Ultra-Q Learning, a variant of Q-Learning [12] adapted for solving multi-agent games with indepen-dent learning agents. Bayesian Ultra-Q Learning is an extension ofthe Bayesian Hyper-Q Learning algorithm proposed by Tesauro [11]that is more efficient for solving adaptive multi-agent games. WhileHyper-Q agents merely update the Q-table corresponding to a sin-gle state, Ultra-Q leverages the information that similar states mostlikely result in similar rewards and therefore updates the Q-valuesof nearby states as well.We assess the performance of our Bayesian Ultra-Q Learningalgorithm against three variants of Hyper-Q as defined by Tesauro,and against Infinitesimal Gradient Ascent (IGA) [9] and Policy HillClimbing (PHC) [1] agents. We do so by evaluating the agentsin two normal-form games, namely, the zero-sum game of rock-paper-scissors and a cooperative stochastic hill-climbing game. Inrock-paper-scissors, games of Bayesian Ultra-Q agents against IGAagents end in draws where, averaged over time, all players play theNash equilibrium, meaning no player can exploit another. AgainstPHC, neither Bayesian Ultra-Q nor Hyper-Q agents are able to winon average, which goes against the findings of Tesauro [11].In the cooperation game, Bayesian Ultra-Q converges in thedirection of an optimal joint strategy and vastly outperforms allother algorithms including Hyper-Q, which are unsuccessful infinding a strong equilibrium due to relative overgeneralisation.
AB - This paper presents Bayesian Ultra-Q Learning, a variant of Q-Learning [12] adapted for solving multi-agent games with indepen-dent learning agents. Bayesian Ultra-Q Learning is an extension ofthe Bayesian Hyper-Q Learning algorithm proposed by Tesauro [11]that is more efficient for solving adaptive multi-agent games. WhileHyper-Q agents merely update the Q-table corresponding to a sin-gle state, Ultra-Q leverages the information that similar states mostlikely result in similar rewards and therefore updates the Q-valuesof nearby states as well.We assess the performance of our Bayesian Ultra-Q Learningalgorithm against three variants of Hyper-Q as defined by Tesauro,and against Infinitesimal Gradient Ascent (IGA) [9] and Policy HillClimbing (PHC) [1] agents. We do so by evaluating the agentsin two normal-form games, namely, the zero-sum game of rock-paper-scissors and a cooperative stochastic hill-climbing game. Inrock-paper-scissors, games of Bayesian Ultra-Q agents against IGAagents end in draws where, averaged over time, all players play theNash equilibrium, meaning no player can exploit another. AgainstPHC, neither Bayesian Ultra-Q nor Hyper-Q agents are able to winon average, which goes against the findings of Tesauro [11].In the cooperation game, Bayesian Ultra-Q converges in thedirection of an optimal joint strategy and vastly outperforms allother algorithms including Hyper-Q, which are unsuccessful infinding a strong equilibrium due to relative overgeneralisation.
M3 - Conference paper
VL - https://alaworkshop2023.github.io/
SP - 1
EP - 7
JO - Proc. of the Adaptive and Learning Agents Workshop (ALA 2023)
JF - Proc. of the Adaptive and Learning Agents Workshop (ALA 2023)
IS - 1
M1 - 57
T2 - 2023 Adaptive and Learning Agents Workshop at AAMAS
Y2 - 29 May 2023 through 30 May 2023
ER -