Projecten per jaar
Samenvatting
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Originele taal-2 | English |
---|---|
Titel | Proceedings of the AAAI Conference on Artificial Intelligence |
Subtitel | Vol. 36 No. 6: AAAI-22 Technical Tracks 6 |
Plaats van productie | Palo Alto, California USA |
Uitgeverij | AAAI Press |
Hoofdstuk | 6 |
Pagina's | 6497-6505 |
Aantal pagina's | 9 |
Volume | 36 |
Uitgave | First |
ISBN van geprinte versie | 1-57735-876-7, 978-1-57735-876-3 |
DOI's | |
Status | Published - 28 jun 2022 |
Evenement | 36th AAAI Conference on Artificial Intelligence - Duur: 22 feb 2022 → 1 mrt 2022 Congresnummer: 36 https://aaai.org/Conferences/AAAI-22/ |
Publicatie series
Naam | Proceedings of the AAAI Conference on Artificial Intelligence |
---|---|
Uitgeverij | AAAI Press |
Nummer | 6 |
Volume | 36 |
ISSN van geprinte versie | 2159-5399 |
ISSN van elektronische versie | 2374-3468 |
Conference
Conference | 36th AAAI Conference on Artificial Intelligence |
---|---|
Verkorte titel | AAAI |
Periode | 22/02/22 → 1/03/22 |
Internet adres |
Vingerafdruk
Duik in de onderzoeksthema's van 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes'. Samen vormen ze een unieke vingerafdruk.Projecten
- 2 Actief
-
iBOF/21/027: DESCARTES - infectieziekten economie en artificiële intelligentie met garanties
Nowe, A., Hens, N. & Beutels, P.
1/01/21 → 31/12/24
Project: Fundamenteel
-
VLAAI1: Subsidie: Onderzoeksprogramma Artificiële Intelligentie (AI) Vlaanderen
1/07/19 → 31/12/23
Project: Toegepast