Projects per year
Abstract
Offline Reinforcement Learning allows to learn a controller for a system from a history of states, actions and rewards, without requiring to interact with the system or a simulator of it. Current Offline RL approaches mainly build on Off-policy RL, such as Q-Learning or TD3, with small extensions to prevent the algorithm from diverging due to the inability to try actions in real time. In this paper, we observe that these incremental approaches mostly lead to low-quality and untrustable policies. We then propose an Offline RL method built from the ground up, based on inferring a discrete-state and discrete-action MDP from the continuous states and actions in the dataset, and then solving the discrete MDP with Value Iteration. Our empirical evaluation shows the promises of our approach, and calls for more research in Offline RL with dedicated algorithms.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2024 Benelux Conference on Artificial Intelligence |
Publisher | Benelux Association for Artificial Intelligence (BNVKI-AIABN) |
Pages | 1-16 |
Number of pages | 14 |
ISBN (Electronic) | 1568-7805 |
Publication status | Published - 18 Nov 2024 |
Event | BNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning - Jaarbeurs Supernova, Utrecht, Netherlands Duration: 18 Nov 2024 → 20 Nov 2024 Conference number: 36 https://bnaic2024.sites.uu.nl/ https://bnaic2024.sites.uu.nl |
Publication series
Name | BNAIC Proceedings |
---|---|
Publisher | BNVKI |
ISSN (Electronic) | 1568-7805 |
Conference
Conference | BNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning |
---|---|
Abbreviated title | BNAIC/BeNeLearn 2024 |
Country/Territory | Netherlands |
City | Utrecht |
Period | 18/11/24 → 20/11/24 |
Internet address |
Projects
- 2 Active
-
VLAAI1: Flanders Artificial Intelligence Research program (FAIR) – second cycle
1/01/24 → 31/12/28
Project: Applied
-
FWOSBO46: SBO Project: CTRL schemes merged with eXplainable AI for t(h)rustworthy control of physical dynamic systems (CTRLxAI=T(H)RUST)
Nowe, A., Steckelmacher, D., Efthymiadis, K. & Schietgat, L.
1/10/22 → 30/09/26
Project: Applied