1 Citation (Scopus)


Multi-agent decision making is prevalent in many real-world applications, such as wind farm control [11], traffic light control [12] and warehouse commissioning [5]. In these settings, agents need to cooperate to maximize a shared team reward [3]. Coordination in multi-agent settings is challenging, due to the combinatorial increase in terms of the number of agents. Therefore, it is computationally intractable to consider all agents' actions jointly. Fortunately, in many real-world settings, an agent is only directly influenced by a small subset of neighboring agents. In this case, the team reward can be factorized over the groups of agents that influence each other. Such a sparse factorization must be exploited to keep multi-agent decision problems tractable. In this work, we consider learning to coordinate in multi-agent multi-armed bandit problems (Section 1), and propose the Multi-Agent Thompson Sampling (MATS) algorithm (Section 2), which exploits sparse interactions in multi-agent systems. We assume that the groups of interacting agents are known beforehand, which is often the case in real-world applications. Our method uses the exploration-exploitation mechanism of Thompson sampling (TS). TS has been shown to achieve high empirical performance [4]. Moreover, TS is a Bayesian method, which allows for the specification of prior knowledge through belief distributions. We argue that this is an important property to have in many practical applications, such as influenza mitigation [7, 8]. We compare MATS against Sparse Cooperative Q-Learning (SCQL) and Multi-Agent Upper Confidence Exploration (MAUCE) on two synthetic settings, i.e., Bernoulli 0101-Chain and Gem Mining (Section 3). MATS improves upon the state of the art with respect to accuracy, learning speed and computational speed.

Original languageEnglish
Title of host publicationProceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020
EditorsBo An, Amal El Fallah Seghrouchni, Gita Sukthankar
Number of pages3
ISBN (Electronic)978-1-4503-7518-4
Publication statusPublished - 31 May 2020
EventThe 19th International Conference on Autonomous Agents and Multi-Agent Systems - Auckland, New Zealand
Duration: 9 May 202013 May 2020
Conference number: 19

Publication series

NameProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
ISSN (Print)2523-5699
ISSN (Electronic)2523-5699


ConferenceThe 19th International Conference on Autonomous Agents and Multi-Agent Systems
Abbreviated titleAAMAS 2020
CountryNew Zealand
Internet address

Fingerprint Dive into the research topics of 'Thompson Sampling for Factored Multi-Agent Bandits'. Together they form a unique fingerprint.

Cite this