Decentralised Reinforcement Learning in Markov Games

Scriptie/Masterproef: Doctoral Thesis


This dissertation introduces a new approach to multi-agent reinforce- ment learning. We develop the Interconnected Learning Automata for Markov Games (MG-ILA) algorithm, in which agents are composed of a network of independent learning units, called learning automata (LA). These automata are relatively simple learners, that can be composed into ad- vanced collectives and provide very general convergence results.
An advantage of our approach is that it has very limited information requirements, since the automata coordinate using only their own reward signals. This allows us to view a multi-state learning problem as a single repeated normal form game from classical game theory. We use this observation to develop a new analysis for multi-agent reinforcement learn- ing. Using this method we show the convergence of MG-ILA towards pure equilibrium points between agent policies.
We then proceed by investigating the properties of our algorithm and proposing a number of extensions. Using results from evolutionary game theory, we analyse the learning dynamics of our system and develop a novel visualisation method for multi-state learning dynamics. We also show how an updated learning rule is able to overcome local optima and achieve global optimality in common interest problems. In conflicting interest cases, we show how this technique can be combined with a simple coordination mechanism to ensure a fair distribution of payoffs amongst all agents.
We conclude the dissertation by examining some possible applications of our system. We start by applying MG-ILA to multi-robot navigation and coordination simulations. We show that, even when only partial state information is available, the algorithm still finds an equilibrium between robot policies. We also consider applications in the field of swarm intelligence. We demonstrate how our system can be used as a model for systems using stigmergetic communication. In these settings agents exchange information by sharing local pheromone signals. Our model allows us to apply our game theory based analysis to this class of algorithms, providing a new method to analyse the global results of local pheromone interactions.
Datum prijs11 mrt 2010
Originele taalEnglish
BegeleiderAnn Nowe (Promotor) & Katja Verbeeck (Co-promotor)

Citeer dit