Actor-critic multi-objective reinforcement learning for non-linear utility functions

Onderzoeksoutput: Special issuepeer review

18 Citaten (Scopus)
348 Downloads (Pure)

Samenvatting

We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.
Originele taal-2English
Artikelnummer23
Aantal pagina's30
TijdschriftAutonomous Agents and Multi-Agent Systems
Volume37
Nummer van het tijdschrift2
DOI's
StatusPublished - 23 apr. 2023

Bibliografische nota

Funding Information:
Conor F. Hayes is funded by the University of Galway Hardiman Scholarship. This research was supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.

Publisher Copyright:
© 2023, Springer Science+Business Media, LLC, part of Springer Nature.

Copyright:
Copyright 2023 Elsevier B.V., All rights reserved.

Vingerafdruk

Duik in de onderzoeksthema's van 'Actor-critic multi-objective reinforcement learning for non-linear utility functions'. Samen vormen ze een unieke vingerafdruk.

Citeer dit