Trustworthy and Explainable Offline Reinforcement Learning by Inferring a Discrete-State Discrete-Action MDP from a Continous-State Continuous-Action dataset

Research output: Chapter in Book/Report/Conference proceedingConference paper

25 Downloads (Pure)

Abstract

Offline Reinforcement Learning allows to learn a controller for a system from a history of states, actions and rewards, without requiring to interact with the system or a simulator of it. Current Offline RL approaches mainly build on Off-policy RL, such as Q-Learning or TD3, with small extensions to prevent the algorithm from diverging due to the inability to try actions in real time. In this paper, we observe that these incremental approaches mostly lead to low-quality and untrustable policies. We then propose an Offline RL method built from the ground up, based on inferring a discrete-state and discrete-action MDP from the continuous states and actions in the dataset, and then solving the discrete MDP with Value Iteration. Our empirical evaluation shows the promises of our approach, and calls for more research in Offline RL with dedicated algorithms.
Original languageEnglish
Title of host publicationProceedings of the 2024 Benelux Conference on Artificial Intelligence
PublisherBenelux Association for Artificial Intelligence (BNVKI-AIABN)
Pages1-16
Number of pages14
ISBN (Electronic)1568-7805
Publication statusPublished - 18 Nov 2024
EventBNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning - Jaarbeurs Supernova, Utrecht, Netherlands
Duration: 18 Nov 202420 Nov 2024
Conference number: 36
https://bnaic2024.sites.uu.nl/
https://bnaic2024.sites.uu.nl

Publication series

NameBNAIC Proceedings
PublisherBNVKI
ISSN (Electronic)1568-7805

Conference

ConferenceBNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning
Abbreviated titleBNAIC/BeNeLearn 2024
Country/TerritoryNetherlands
CityUtrecht
Period18/11/2420/11/24
Internet address

Cite this