## Abstract

Abstract. Learning Automata (LA) were recently shown to be valuable

tools for designing Multi-Agent Reinforcement Learning algorithms. One

of the principal contributions of LA theory is that a set of decentralized,

independent learning automata is able to control a finite Markov Chain

with unknown transition probabilities and rewards. This result was re-

cently extended to Markov Games and analyzed with the use of limiting

games. In this paper we continue this analysis but we assume here that

our agents are fully ignorant about the other agents in the environment,

i.e. they can only observe themselves; they don't know how many other

agents are present in the environment, the actions these other agents

took nor the rewards they received for this, nor the location they

occupy in the state space. We prove that in Markov Games, where agents

have this limited type of observability, a network of independent LA is

still able to converge to an equilibrium point of the underlying limiting

game, provided a common ergodic assumption and provided the agents

do not interfere each other's transition probabilities.

tools for designing Multi-Agent Reinforcement Learning algorithms. One

of the principal contributions of LA theory is that a set of decentralized,

independent learning automata is able to control a finite Markov Chain

with unknown transition probabilities and rewards. This result was re-

cently extended to Markov Games and analyzed with the use of limiting

games. In this paper we continue this analysis but we assume here that

our agents are fully ignorant about the other agents in the environment,

i.e. they can only observe themselves; they don't know how many other

agents are present in the environment, the actions these other agents

took nor the rewards they received for this, nor the location they

occupy in the state space. We prove that in Markov Games, where agents

have this limited type of observability, a network of independent LA is

still able to converge to an equilibrium point of the underlying limiting

game, provided a common ergodic assumption and provided the agents

do not interfere each other's transition probabilities.

Original language | English |
---|---|

Pages (from-to) | 171-183 |

Number of pages | 13 |

Journal | Technical report in mathematics and computer science |

Publication status | Published - 2007 |

Event | Seventh European Symposium on Adaptive and Learning Agents and Multi-Agent Systems (ALAMAS'07) - Maastricht University, Maastricht, Netherlands Duration: 2 Apr 2007 → 3 Apr 2007 |

## Keywords

- learning automata
- multi-agent
- partial observability
- limiting game