Deep Multi-Agent Reinforcement Learning in a Homogeneous Open Population

Onderzoeksoutput: Conference paper

4 Citaten (Scopus)
176 Downloads (Pure)


Advances in reinforcement learning research have recently produced agents that are competent, or sometimes exceed human performance, in complex tasks. Most interesting real world problems however, are not restricted to one agent, but instead deal with multiple agents acting in the same environment and have proven to be challenging tasks to solve. In this work we present a study on a homogeneous open population of agents modelled as a multi-agent reinforcement learning (MARL) system. We propose a centralised learning approach, with decentralised execution in which agents are given the same policy to execute individually. Using the SimuLane highway traffic simulator as a test-bed we show experimentally that using a single-agent learnt policy to initialise the multi-agent scenario, which we then fine-tune to the task, out-performs agents that learn in the multi-agent setting from scratch. Specifically we contribute an open population MARL configuration, how to transfer knowledge from single- to a multi-agent setting and a training procedure for a homogeneous open population of agents.

Originele taal-2English
TitelArtificial Intelligence
Subtitel30th Benelux Conference, BNAIC 2018, ‘s-Hertogenbosch, The Netherlands, November 8–9, 2018, Revised Selected Papers
RedacteurenMartin Atzmueller, Wouter Duivesteijn
UitgeverijSpringer International Publishing
Aantal pagina's15
ISBN van elektronische versie978-3-030-31978-6
ISBN van geprinte versie978-3-030-31977-9
StatusPublished - 8 nov 2018
Evenement30th Benelux Conference on Artificial Intelligence - ‘s-Hertogenbosch, Netherlands
Duur: 8 nov 20189 nov 2018

Publicatie series

NaamBelgian/Netherlands Artificial Intelligence Conference
ISSN van geprinte versie1568-7805


Conference30th Benelux Conference on Artificial Intelligence
Verkorte titelBNAIC 2018
Internet adres


Duik in de onderzoeksthema's van 'Deep Multi-Agent Reinforcement Learning in a Homogeneous Open Population'. Samen vormen ze een unieke vingerafdruk.

Citeer dit