Projects per year
Abstract
Although deep reinforcement learning (DRL) has many success stories, the large-scale deployment of policies learned through these advanced techniques in safety-critical scenarios is hindered by their lack of formal guarantees. Variational Markov Decision Processes (VAE-MDPs) are discrete latent space models that provide a reliable framework for distilling formally verifiable controllers from any RL policy. While the related guarantees address relevant practical aspects such as the satisfaction of performance and safety properties, the VAE approach suffers from several learning flaws (posterior collapse, slow learning speed, poor dynamics estimates), primarily due to the absence of abstraction and representation guarantees to support latent optimization. We introduce the Wasserstein auto-encoded MDP (WAE-MDP), a latent space model that fixes those issues by minimizing a penalized form of the optimal transport between the behaviors of the agent executing the original policy and the distilled policy, for which the formal guarantees apply. Our approach yields bisimulation guarantees while learning the distilled policy, allowing concrete optimization of the abstraction and representation model quality. Our experiments show that, besides distilling policies up to 10 times faster, the latent model quality is indeed better in general. Moreover, we present experiments from a simple time-to-failure verification algorithm on the latent space. The fact that our approach enables such simple verification techniques highlights its applicability.
Original language | English |
---|---|
Title of host publication | The Eleventh International Conference on Learning Representations |
Subtitle of host publication | ICLR 2023 |
Place of Publication | Kigali, Rwanda |
Publisher | OpenReview.net |
Pages | 1-30 |
Number of pages | 30 |
Volume | 11 |
Publication status | Published - 1 Feb 2023 |
Event | The Eleventh International Conference on Learning Representations: ICLR 2023 - Kigali Convention Centre, Kigali, Rwanda Duration: 1 May 2023 → 5 May 2023 Conference number: 11 https://iclr.cc/Conferences/2023 |
Conference
Conference | The Eleventh International Conference on Learning Representations |
---|---|
Abbreviated title | ICLR 2023 |
Country/Territory | Rwanda |
City | Kigali |
Period | 1/05/23 → 5/05/23 |
Internet address |
Bibliographical note
Funding Information:This research received funding from the Flemish Government (AI Research Program) and was supported by the DESCARTES iBOF project. G.A. Perez is also supported by the Belgian FWO \u201CSAILor\u201D project (G030020N). We thank Raphael Avalos for his valuable feedback during the preparation of this manuscript.
Publisher Copyright:
© 2023 11th International Conference on Learning Representations, ICLR 2023. All rights reserved.
Keywords
- Reinforcement learning
- Formal Verification
- Representation Learning
Fingerprint
Dive into the research topics of 'Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees'. Together they form a unique fingerprint.Projects
- 2 Active
-
VLAAI1: Flanders Artificial Intelligence Research program (FAIR) – second cycle
1/01/24 → 31/12/28
Project: Applied
-
iBOF/21/027: DESCARTES - infectious DisEaSe eConomics and Ai with guaRanTEeS
Nowe, A., Hens, N. & Beutels, P.
1/01/21 → 31/12/26
Project: Fundamental
-
Activating formal verification of deep reinforcement learning policies by model checking bisimilar latent space models
Delgrange, F., 2024, VUB Press. 348 p.Research output: Thesis › PhD Thesis
Open AccessFile -
Controller Synthesis from Deep Reinforcement Learning Policies: Extended Abstract
Delgrange, F., Avni, G., Lukina, A., Schilling, C., Nowe, A. & Pérez, G. A., 18 Nov 2024.Research output: Unpublished contribution to conference › Unpublished abstract
Open AccessFile