Projecten per jaar
Samenvatting
We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.
Originele taal-2 | English |
---|---|
Artikelnummer | 23 |
Aantal pagina's | 30 |
Tijdschrift | Autonomous Agents and Multi-Agent Systems |
Volume | 37 |
Nummer van het tijdschrift | 2 |
DOI's | |
Status | Published - 23 apr. 2023 |
Bibliografische nota
Funding Information:Conor F. Hayes is funded by the University of Galway Hardiman Scholarship. This research was supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.
Publisher Copyright:
© 2023, Springer Science+Business Media, LLC, part of Springer Nature.
Copyright:
Copyright 2023 Elsevier B.V., All rights reserved.
Vingerafdruk
Duik in de onderzoeksthema's van 'Actor-critic multi-objective reinforcement learning for non-linear utility functions'. Samen vormen ze een unieke vingerafdruk.Projecten
- 1 Actief
-
VLAAI1: Vlaams Artificiële Intelligentie Onderzoeksprogramma (VAIOP) – tweede cyclus
1/01/24 → 31/12/28
Project: Toegepast