Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes

Florent Delgrange, Ann Nowé, Guillermo A. Pérez

Onderzoeksoutput: Conference paper

6 Citaten (Scopus)
47 Downloads (Pure)

Samenvatting

We consider the challenge of policy simplification and verification in the context of policies learned through reinforcement learning (RL) in continuous environments. In well-behaved settings, RL algorithms have convergence guarantees in the limit. While these guarantees are valuable, they are insufficient for safety-critical applications. Furthermore, they are lost when applying advanced techniques such as deep-RL. To recover guarantees when applying advanced RL algorithms to more complex environments with (i) reachability, (ii) safety-constrained reachability, or (iii) discounted-reward objectives, we build upon the DeepMDP framework introduced by Gelada et al. to derive new bisimulation bounds between the unknown environment and a learned discrete latent model of it. Our bisimulation bounds enable the application of formal methods for Markov decision processes. Finally, we show how one can use a policy obtained via state-of-the-art RL to efficiently train a variational autoencoder that yields a discrete latent model with provably approximately correct bisimulation guarantees. Additionally, we obtain a distilled version of the policy for the latent model.
Originele taal-2English
TitelProceedings of the AAAI Conference on Artificial Intelligence
SubtitelVol. 36 No. 6: AAAI-22 Technical Tracks 6
Plaats van productiePalo Alto, California USA
UitgeverijAAAI Press
Hoofdstuk6
Pagina's6497-6505
Aantal pagina's9
Volume36
UitgaveFirst
ISBN van elektronische versie1577358767, 9781577358763
ISBN van geprinte versie1-57735-876-7, 978-1-57735-876-3
DOI's
StatusPublished - 28 jun. 2022
Evenement36th AAAI Conference on Artificial Intelligence -
Duur: 22 feb. 20221 mrt. 2022
Congresnummer: 36
https://aaai.org/Conferences/AAAI-22/

Publicatie series

NaamProceedings of the AAAI Conference on Artificial Intelligence
UitgeverijAAAI Press
Nummer6
Volume36
ISSN van geprinte versie2159-5399
ISSN van elektronische versie2374-3468

Conference

Conference36th AAAI Conference on Artificial Intelligence
Verkorte titelAAAI
Periode22/02/221/03/22
Internet adres

Bibliografische nota

Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Vingerafdruk

Duik in de onderzoeksthema's van 'Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes'. Samen vormen ze een unieke vingerafdruk.

Citeer dit