Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes: Encore Abstract

Florent Delgrange, Ann Nowe, Guillermo A. Pérez

Research output: Unpublished contribution to conferenceUnpublished abstract

14 Downloads (Pure)

Abstract

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Original languageEnglish
Publication statusPublished - 7 Nov 2022
EventBNAIC/BeNeLearn 2022: Joint International Scientific Conferences on AI and Machine Learning - Lamot Mechelen, Belgium
Duration: 7 Nov 20229 Nov 2022
https://bnaic2022.uantwerpen.be/

Conference

ConferenceBNAIC/BeNeLearn 2022
Abbreviated titleBNAIC/BeNeLearn 2022
Country/TerritoryBelgium
CityLamot Mechelen
Period7/11/229/11/22
Internet address

Keywords

  • Reinforcement Learning
  • Formal Methods
  • Representation Learning

Fingerprint

Dive into the research topics of 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes: Encore Abstract'. Together they form a unique fingerprint.

Cite this