Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning

Research output: Chapter in Book/Report/Conference proceedingConference paper

Abstract

Multi-agent reinforcement learning (MARL) enables us to create adaptive agents in challenging environments, even when the agents have limited observation. Modern MARL methods have focused on finding factorized value functions. While successful, the resulting methods have convoluted network structures. We take a radically different approach and build on the structure of independent Q-learners. Our algorithm LAN leverages a dueling architecture to represent decentralized policies as separate individual advantage functions w.r.t.\ a centralized critic that is cast aside after training. The critic works as a stabilizer that coordinates the learning and to formulate DQN targets. This enables LAN to keep the number of parameters of its centralized network independent in the number of agents, without imposing additional constraints like monotonic value functions.
When evaluated on the SMAC, LAN shows SOTA performance overall and scores more than 80\% wins in two super-hard maps where even QPLEX does not obtain almost any wins. Moreover, when the number of agents becomes large, LAN uses significantly fewer parameters than QPLEX or even QMIX. We thus show that LAN's structure forms a key improvement that helps MARL methods remain scalable.
Original languageEnglish
Title of host publicationThe 21st International Conference on Autonomous Agents and Multiagent Systems
Subtitle of host publicationExtended Abstract
PublisherIFAAMAS
Publication statusAccepted/In press - 9 May 2022
Event21st International Conference on Autonomous Agents and Multi-agent System -
Duration: 9 May 202213 May 2022
Conference number: 21
https://aamas2022-conference.auckland.ac.nz

Conference

Conference21st International Conference on Autonomous Agents and Multi-agent System
Abbreviated titleAAMAS
Period9/05/2213/05/22
Internet address

Fingerprint

Dive into the research topics of 'Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this