Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes: Encore Abstract

Florent Delgrange, Ann Nowe, Guillermo A. Pérez

Onderzoeksoutput: Unpublished abstract

25 Downloads (Pure)

Samenvatting

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Originele taal-2English
StatusPublished - 7 nov 2022
EvenementBNAIC/BeNeLearn 2022: Joint International Scientific Conferences on AI and Machine Learning - Lamot Mechelen, Belgium
Duur: 7 nov 20229 nov 2022
https://bnaic2022.uantwerpen.be/

Conference

ConferenceBNAIC/BeNeLearn 2022
Verkorte titelBNAIC/BeNeLearn 2022
Land/RegioBelgium
StadLamot Mechelen
Periode7/11/229/11/22
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes: Encore Abstract'. Samen vormen ze een unieke vingerafdruk.

Citeer dit