Actor-critic multi-objective reinforcement learning for non-linear utility functions

Research output: Contribution to journalSpecial issuepeer-review

Abstract

We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.
Original languageEnglish
Article number23
Number of pages30
JournalAutonomous Agents and Multi-Agent Systems
Volume37
Issue number2
DOIs
Publication statusPublished - 23 Apr 2023

Bibliographical note

Funding Information:
Conor F. Hayes is funded by the University of Galway Hardiman Scholarship. This research was supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.

Publisher Copyright:
© 2023, Springer Science+Business Media, LLC, part of Springer Nature.

Copyright:
Copyright 2023 Elsevier B.V., All rights reserved.

Cite this