Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes

Florent Delgrange, Ann Nowé, Guillermo A. Pérez

Research output: Chapter in Book/Report/Conference proceedingConference paper

23 Downloads (Pure)

Abstract

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Original languageEnglish
Title of host publicationProceedings of the AAAI Conference on Artificial Intelligence
Subtitle of host publicationVol. 36 No. 6: AAAI-22 Technical Tracks 6
Place of PublicationPalo Alto, California USA
PublisherAAAI Press
Chapter6
Pages6497-6505
Number of pages9
Volume36
EditionFirst
ISBN (Print)1-57735-876-7, 978-1-57735-876-3
DOIs
Publication statusPublished - 28 Jun 2022
Event36th AAAI Conference on Artificial Intelligence -
Duration: 22 Feb 20221 Mar 2022
Conference number: 36
https://aaai.org/Conferences/AAAI-22/

Publication series

NameProceedings of the AAAI Conference on Artificial Intelligence
PublisherAAAI Press
Number6
Volume36
ISSN (Print)2159-5399
ISSN (Electronic)2374-3468

Conference

Conference36th AAAI Conference on Artificial Intelligence
Abbreviated titleAAAI
Period22/02/221/03/22
Internet address

Keywords

  • Machine Learning
  • Artificial Intelligence
  • Formal Methods
  • Reinforcement Learning
  • Knowledge Representation And Reasoning
  • Reasoning Under Uncertainty

Fingerprint

Dive into the research topics of 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes'. Together they form a unique fingerprint.

Cite this