Projects per year
We present a novel model-based algorithm, Cooperative Prioritized Sweeping, for sample-efficient learning in large multi-agent Markov decision processes. Our approach leverages domain knowledge about the structure of the problem in the form of a dynamic decision network. Using this information, our method learns a model of the environment to determine which state-action pairs are the most likely in need to be updated, significantly increasing learning speed. Batch updates can then be performed which efficiently back-propagate knowledge throughout the value function. Our method outperforms the state-of-the-art sparse cooperative Q-learning and QMIX algorithms, both on the well-known SysAdmin benchmark, randomized environments and a fully-observable variation of the well-known firefighter benchmark from Dec-POMDP literature.
|Title of host publication||Proceedings of the 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021|
|Number of pages||9|
|Publication status||Published - 2021|
|Event||The 20th International Conference on Autonomous Agents and Multiagent Systems - Virtual|
Duration: 3 May 2021 → 7 May 2021
|Conference||The 20th International Conference on Autonomous Agents and Multiagent Systems|
|Abbreviated title||AAMAS 2021|
|Period||3/05/21 → 7/05/21|
FingerprintDive into the research topics of 'Cooperative Prioritized Sweeping'. Together they form a unique fingerprint.
1/11/19 → 31/10/21
1/07/19 → 30/06/22