Projects per year
Abstract
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Original language | English |
---|---|
Publication status | Unpublished - 3 Jun 2022 |
Event | Belgium-Netherlands workshop on Reinforcement Learning 2022 - Leiden University, Leiden, Netherlands Duration: 3 Jun 2022 → 3 Jun 2022 https://rlg.liacs.nl/benerl-2022 |
Conference
Conference | Belgium-Netherlands workshop on Reinforcement Learning 2022 |
---|---|
Abbreviated title | BeNeRL 2022 |
Country | Netherlands |
City | Leiden |
Period | 3/06/22 → 3/06/22 |
Internet address |
Keywords
- Reinforcement Learning
Projects
- 2 Active
-
iBOF/21/027: DESCARTES - infectious DisEaSe eConomics and Ai with guaRanTEeS
Nowe, A., Hens, N. & Beutels, P.
1/01/21 → 31/12/24
Project: Fundamental
-
VLAAI1: Subsidie: Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen
1/07/19 → 31/12/23
Project: Applied