Many real-world scenarios involve multiple agents that try to cooperate to achieve a shared goal. For example, in smart grids many decentralized entities must communicate in order to effectively share the energy in the grid. The multi-agent domain is very challenging due to the exponential state and action space sizes. This makes it very hard to learn efficiently, and also results in intractable optimal solutions.
In most multi-agent problems, however, each agent directly interacts with only a subset of the environment. This underlying structure allows for a compact representation of the problem, which can speed up learning, and can also be used, together with approximations, to tackle the intractability of the optimal solution. By theoretically bounding these approximations, it will also be possible to guarantee the quality of a given algorithm when applied to different models.
To do this, we propose the use of multi-agent model-based reinforcement learning, as it provides a good framework to tackle these challenges. We will focus on three main objectives. Firstly, assuming the structure is known, we will develop novel algorithms to learn to act efficiently in sequential large scale environments. Secondly, we will apply these algorithms to scenarios where the structure is not fixed, but depends on the current state of the environment. Lastly, we will develop a theoretical framework that will allow to bound the quality of the approximate solutions with respect to optimal.