Projecten per jaar
Samenvatting
We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Originele taal-2 | English |
---|---|
Status | Published - 7 nov. 2022 |
Evenement | BNAIC/BeNeLearn 2022: Joint International Scientific Conferences on AI and Machine Learning - Lamot Mechelen, Belgium Duur: 7 nov. 2022 → 9 nov. 2022 https://bnaic2022.uantwerpen.be/ |
Conference
Conference | BNAIC/BeNeLearn 2022 |
---|---|
Verkorte titel | BNAIC/BeNeLearn 2022 |
Land/Regio | Belgium |
Stad | Lamot Mechelen |
Periode | 7/11/22 → 9/11/22 |
Internet adres |
Vingerafdruk
Duik in de onderzoeksthema's van 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes: Encore Abstract'. Samen vormen ze een unieke vingerafdruk.Projecten
- 2 Actief
-
VLAAI1: Vlaams Artificiële Intelligentie Onderzoeksprogramma (VAIOP) – tweede cyclus
1/01/24 → 31/12/28
Project: Toegepast
-
iBOF/21/027: DESCARTES - infectieziekten economie en artificiële intelligentie met garanties
Nowe, A., Hens, N. & Beutels, P.
1/01/21 → 31/12/26
Project: Fundamenteel
Onderzoekersoutput
- 1 Conference paper
-
Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes
Delgrange, F., Nowé, A. & Pérez, G. A., 28 jun. 2022, Proceedings of the AAAI Conference on Artificial Intelligence: Vol. 36 No. 6: AAAI-22 Technical Tracks 6. First redactie Palo Alto, California USA: AAAI Press, Vol. 36. blz. 6497-6505 9 blz. (Proceedings of the AAAI Conference on Artificial Intelligence; vol. 36, nr. 6).Onderzoeksoutput: Conference paper
Open AccessBestand6 Citaten (Scopus)47 Downloads (Pure)