Projecten per jaar
Samenvatting
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Originele taal-2 | English |
---|---|
Titel | Proceedings of the AAAI Conference on Artificial Intelligence |
Subtitel | Vol. 36 No. 6: AAAI-22 Technical Tracks 6 |
Plaats van productie | Palo Alto, California USA |
Uitgeverij | AAAI Press |
Hoofdstuk | 6 |
Pagina's | 6497-6505 |
Aantal pagina's | 9 |
Volume | 36 |
Uitgave | First |
ISBN van elektronische versie | 1577358767, 9781577358763 |
ISBN van geprinte versie | 1-57735-876-7, 978-1-57735-876-3 |
DOI's | |
Status | Published - 28 jun. 2022 |
Evenement | 36th AAAI Conference on Artificial Intelligence - Duur: 22 feb. 2022 → 1 mrt. 2022 Congresnummer: 36 https://aaai.org/Conferences/AAAI-22/ |
Publicatie series
Naam | Proceedings of the AAAI Conference on Artificial Intelligence |
---|---|
Uitgeverij | AAAI Press |
Nummer | 6 |
Volume | 36 |
ISSN van geprinte versie | 2159-5399 |
ISSN van elektronische versie | 2374-3468 |
Conference
Conference | 36th AAAI Conference on Artificial Intelligence |
---|---|
Verkorte titel | AAAI |
Periode | 22/02/22 → 1/03/22 |
Internet adres |
Bibliografische nota
Publisher Copyright:Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Vingerafdruk
Duik in de onderzoeksthema's van 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes'. Samen vormen ze een unieke vingerafdruk.-
VLAAI1: Vlaams Artificiële Intelligentie Onderzoeksprogramma (VAIOP) – tweede cyclus
1/01/24 → 31/12/28
Project: Toegepast
-
iBOF/21/027: DESCARTES - infectieziekten economie en artificiële intelligentie met garanties
Nowe, A., Hens, N. & Beutels, P.
1/01/21 → 31/12/26
Project: Fundamenteel
-
-
Activating formal verification of deep reinforcement learning policies by model checking bisimilar latent space models
Delgrange, F., 2024, VUB Press. 348 blz.Onderzoeksoutput: PhD Thesis
Open AccessBestand -
Controller Synthesis from Deep Reinforcement Learning Policies: Extended Abstract
Delgrange, F., Avni, G., Lukina, A., Schilling, C., Nowe, A. & Pérez, G. A., 18 nov. 2024.Onderzoeksoutput: Unpublished abstract
Open AccessBestand -
Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes: Encore Abstract
Delgrange, F., Nowe, A. & Pérez, G. A., 7 nov. 2022.Onderzoeksoutput: Unpublished abstract
Open AccessBestand
Activiteiten
- 1 Talk or presentation at a conference
-
Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees
Florent Delgrange (Speaker), Ann Nowe (Contributor) & Guillermo A. Pérez (Contributor)
7 nov. 2022 → 9 nov. 2022Activiteit: Talk or presentation at a conference
Bestand