Projects per year
Abstract
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Original language | English |
---|---|
Title of host publication | Proceedings of the AAAI Conference on Artificial Intelligence |
Subtitle of host publication | Vol. 36 No. 6: AAAI-22 Technical Tracks 6 |
Place of Publication | Palo Alto, California USA |
Publisher | AAAI Press |
Chapter | 6 |
Pages | 6497-6505 |
Number of pages | 9 |
Volume | 36 |
Edition | First |
ISBN (Print) | 1-57735-876-7, 978-1-57735-876-3 |
DOIs | |
Publication status | Published - 28 Jun 2022 |
Event | 36th AAAI Conference on Artificial Intelligence - Duration: 22 Feb 2022 → 1 Mar 2022 Conference number: 36 https://aaai.org/Conferences/AAAI-22/ |
Publication series
Name | Proceedings of the AAAI Conference on Artificial Intelligence |
---|---|
Publisher | AAAI Press |
Number | 6 |
Volume | 36 |
ISSN (Print) | 2159-5399 |
ISSN (Electronic) | 2374-3468 |
Conference
Conference | 36th AAAI Conference on Artificial Intelligence |
---|---|
Abbreviated title | AAAI |
Period | 22/02/22 → 1/03/22 |
Internet address |
Keywords
- Machine Learning
- Artificial Intelligence
- Formal Methods
- Reinforcement Learning
- Knowledge Representation And Reasoning
- Reasoning Under Uncertainty
Fingerprint
Dive into the research topics of 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes'. Together they form a unique fingerprint.Projects
- 2 Active
-
iBOF/21/027: DESCARTES - infectious DisEaSe eConomics and Ai with guaRanTEeS
Nowe, A., Hens, N. & Beutels, P.
1/01/21 → 31/12/24
Project: Fundamental
-
VLAAI1: Subsidie: Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen
1/07/19 → 31/12/23
Project: Applied
Research output
- 1 Unpublished abstract
-
Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes: Encore Abstract
Delgrange, F., Nowe, A. & Pérez, G. A., 7 Nov 2022.Research output: Unpublished contribution to conference › Unpublished abstract
Open AccessFile
Activities
- 1 Talk or presentation at a conference
-
Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees
Florent Delgrange (Speaker), Ann Nowe (Contributor) & Guillermo A. Pérez (Contributor)
7 Nov 2022 → 9 Nov 2022Activity: Talk or presentation › Talk or presentation at a conference
File