Efficient Bayesian Ultra-Q Learning for Multi-Agent Games

Ward Gauderis, Fabian Denoodt, Bram Silue, Pierre Vanvolsem, Andries Rosseau

Onderzoeksoutput: Conference paper

33 Downloads (Pure)


This paper presents Bayesian Ultra-Q Learning, a variant of Q-
Learning [12] adapted for solving multi-agent games with indepen-
dent learning agents. Bayesian Ultra-Q Learning is an extension of
the Bayesian Hyper-Q Learning algorithm proposed by Tesauro [11]
that is more efficient for solving adaptive multi-agent games. While
Hyper-Q agents merely update the Q-table corresponding to a sin-
gle state, Ultra-Q leverages the information that similar states most
likely result in similar rewards and therefore updates the Q-values
of nearby states as well.
We assess the performance of our Bayesian Ultra-Q Learning
algorithm against three variants of Hyper-Q as defined by Tesauro,
and against Infinitesimal Gradient Ascent (IGA) [9] and Policy Hill
Climbing (PHC) [1] agents. We do so by evaluating the agents
in two normal-form games, namely, the zero-sum game of rock-
paper-scissors and a cooperative stochastic hill-climbing game. In
rock-paper-scissors, games of Bayesian Ultra-Q agents against IGA
agents end in draws where, averaged over time, all players play the
Nash equilibrium, meaning no player can exploit another. Against
PHC, neither Bayesian Ultra-Q nor Hyper-Q agents are able to win
on average, which goes against the findings of Tesauro [11].
In the cooperation game, Bayesian Ultra-Q converges in the
direction of an optimal joint strategy and vastly outperforms all
other algorithms including Hyper-Q, which are unsuccessful in
finding a strong equilibrium due to relative overgeneralisation.
Originele taal-2English
Pagina's (van-tot)1-7
Aantal pagina's7
TijdschriftProc. of the Adaptive and Learning Agents Workshop (ALA 2023)
Nummer van het tijdschrift1
StatusPublished - mei 2023
Evenement2023 Adaptive and Learning Agents Workshop at AAMAS - London, United Kingdom
Duur: 29 mei 202330 mei 2023


Duik in de onderzoeksthema's van 'Efficient Bayesian Ultra-Q Learning for Multi-Agent Games'. Samen vormen ze een unieke vingerafdruk.

Citeer dit