Near On-Policy Experience Sampling in Multi-Objective Reinforcement Learning

Shang Wang, Mathieu Reymond, Athirai A. Irissappane, Diederik M. Roijers

Onderzoeksoutput: Conference paper

166 Downloads (Pure)

Samenvatting

In multi-objective decision problems, the same state-action pair under different preference weights between the objectives, constitutes different optimal policies. The introduction of changing preference weights interferes with the convergence of the network, and can even stop the network from converging. In this paper, we propose a novel experience sampling strategy for multi-objective RL problems, which samples transitions based on the weight and state similarities, to get the sampled experiences close to on-policy. We apply our sampling strategy in multi-objective deep RL algorithms on known benchmark problems, and show that this strongly improves performance.
Originele taal-2English
TitelInternational Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022
UitgeverijIFAAMAS
Pagina's1756-1758
Aantal pagina's3
ISBN van elektronische versie9781713854333
StatusPublished - 9 mei 2022
Evenement21st International Conference on Autonomous Agents and Multi-agent System -
Duur: 9 mei 202213 mei 2022
Congresnummer: 21
https://aamas2022-conference.auckland.ac.nz

Publicatie series

NaamProceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
Volume3
ISSN van geprinte versie1548-8403
ISSN van elektronische versie1558-2914

Conference

Conference21st International Conference on Autonomous Agents and Multi-agent System
Verkorte titelAAMAS
Periode9/05/2213/05/22
Internet adres

Bibliografische nota

Funding Information:
This research was supported by funding from the Flemish Government under the “Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen” program.

Publisher Copyright:
© 2022 International Foundation for Autonomous Agents and Multiagent Systems (www.ifaamas.org). All rights reserved.

Copyright:
Copyright 2022 Elsevier B.V., All rights reserved.

Vingerafdruk

Duik in de onderzoeksthema's van 'Near On-Policy Experience Sampling in Multi-Objective Reinforcement Learning'. Samen vormen ze een unieke vingerafdruk.

Citeer dit