Efficient Bayesian Ultra-Q Learning for Multi-Agent Games

Ward Gauderis, Fabian Denoodt, Bram Silue, Pierre Vanvolsem, Andries Rosseau

Research output: Contribution to journalConference paper

66 Downloads (Pure)

Abstract

This paper presents Bayesian Ultra-Q Learning, a variant of Q-
Learning [12] adapted for solving multi-agent games with indepen-
dent learning agents. Bayesian Ultra-Q Learning is an extension of
the Bayesian Hyper-Q Learning algorithm proposed by Tesauro [11]
that is more efficient for solving adaptive multi-agent games. While
Hyper-Q agents merely update the Q-table corresponding to a sin-
gle state, Ultra-Q leverages the information that similar states most
likely result in similar rewards and therefore updates the Q-values
of nearby states as well.
We assess the performance of our Bayesian Ultra-Q Learning
algorithm against three variants of Hyper-Q as defined by Tesauro,
and against Infinitesimal Gradient Ascent (IGA) [9] and Policy Hill
Climbing (PHC) [1] agents. We do so by evaluating the agents
in two normal-form games, namely, the zero-sum game of rock-
paper-scissors and a cooperative stochastic hill-climbing game. In
rock-paper-scissors, games of Bayesian Ultra-Q agents against IGA
agents end in draws where, averaged over time, all players play the
Nash equilibrium, meaning no player can exploit another. Against
PHC, neither Bayesian Ultra-Q nor Hyper-Q agents are able to win
on average, which goes against the findings of Tesauro [11].
In the cooperation game, Bayesian Ultra-Q converges in the
direction of an optimal joint strategy and vastly outperforms all
other algorithms including Hyper-Q, which are unsuccessful in
finding a strong equilibrium due to relative overgeneralisation.
Original languageEnglish
Article number57
Pages (from-to)1-7
Number of pages7
JournalProc. of the Adaptive and Learning Agents Workshop (ALA 2023)
Volumehttps://alaworkshop2023.github.io/
Issue number1
Publication statusPublished - May 2023
Event2023 Adaptive and Learning Agents Workshop at AAMAS - London, United Kingdom
Duration: 29 May 202330 May 2023
https://alaworkshop2023.github.io

Fingerprint

Dive into the research topics of 'Efficient Bayesian Ultra-Q Learning for Multi-Agent Games'. Together they form a unique fingerprint.

Cite this