Samenvatting

We present a novel model-based algorithm, Cooperative Prioritized Sweeping, for sample-efficient learning in large multi-agent Markov decision processes. Our approach leverages domain knowledge about the structure of the problem in the form of a dynamic decision network. Using this information, our method learns a model of the environment to determine which state-action pairs are the most likely in need to be updated, significantly increasing learning speed. Batch updates can then be performed which efficiently back-propagate knowledge throughout the value function. Our method outperforms the state-of-the-art sparse cooperative Q-learning and QMIX algorithms, both on the well-known SysAdmin benchmark, randomized environments and a fully-observable variation of the well-known firefighter benchmark from Dec-POMDP literature.

Originele taal-2English
TitelProceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021
UitgeverijIFAAMAS
Pagina's160-168
Aantal pagina's9
ISBN van elektronische versie9781713832621
DOI's
StatusPublished - 2021
EvenementThe 20th International Conference on Autonomous Agents and Multiagent Systems - Virtual
Duur: 3 mei 20217 mei 2021
https://aamas2021.soton.ac.uk/

Publicatie series

NaamProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume1
ISSN van geprinte versie1548-8403
ISSN van elektronische versie1558-2914

Conference

ConferenceThe 20th International Conference on Autonomous Agents and Multiagent Systems
Verkorte titelAAMAS 2021
Periode3/05/217/05/21
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Cooperative Prioritized Sweeping'. Samen vormen ze een unieke vingerafdruk.

Citeer dit