Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes

Florent Delgrange, Ann Nowé, Guillermo A. Pérez

Research output: Chapter in Book/Report/Conference proceedingConference paper

Abstract

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Original languageEnglish
Title of host publicationProceedings of the AAAI Conference on Artificial Intelligence
Subtitle of host publicationVol. 36 No. 6: AAAI-22 Technical Tracks 6
Place of PublicationPalo Alto, California USA
PublisherAAAI Press
Chapter6
Pages6497-6505
Number of pages9
Volume36
EditionFirst
ISBN (Print)1-57735-876-7, 978-1-57735-876-3
DOIs
Publication statusPublished - 28 Jun 2022
Event36th AAAI Conference on Artificial Intelligence -
Duration: 22 Feb 20221 Mar 2022
Conference number: 36
https://aaai.org/Conferences/AAAI-22/

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
PublisherAAAI Press
Number6
Volume36
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference36th AAAI Conference on Artificial Intelligence
Abbreviated titleAAAI
Period22/02/221/03/22
Internet address

Keywords

  • Machine Learning
  • Artificial Intelligence
  • Formal Methods
  • Reinforcement Learning
  • Knowledge Representation And Reasoning
  • Reasoning Under Uncertainty

Fingerprint

Dive into the research topics of 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes'. Together they form a unique fingerprint.

Cite this