Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics

Onderzoeksoutput: Conference paper

4 Citaten (Scopus)
122 Downloads (Pure)

Samenvatting

Value-based reinforcement-learning algorithms provide state-of-the-art results in model-free discrete-action settings, and tend to outperform actor-critic algorithms. We argue that actor-critic algorithms are limited by their need for an on-policy critic. We propose Bootstrapped Dual Policy Iteration (BDPI), a novel model-free reinforcement-learning algorithm for continuous states and discrete actions, with an actor and several off-policy critics. Off-policy critics are compatible with experience replay, ensuring high sample-efficiency, without the need for off-policy corrections. The actor, by slowly imitating the average greedy policy of the critics, leads to high-quality and state-specific exploration, which we compare to Thompson sampling. Because the actor and critics are fully decoupled, BDPI is remarkably stable, and unusually robust to its hyper-parameters. BDPI is significantly more sample-efficient than Bootstrapped DQN, PPO, and ACKTR, on discrete, continuous and pixel-based tasks.
Originele taal-2English
TitelLecture Notes in Artificial Intelligence
SubtitelMachine Learning and Knowledge Discovery in Databases (ECML-PKDD proceedings), volume III
RedacteurenUlf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet
UitgeverijSpringer
Pagina's19-34
Aantal pagina's16
Volume11908
ISBN van elektronische versie978-3-030-46133-1
ISBN van geprinte versie978-3-030-46132-4
DOI's
StatusPublished - 2020
EvenementEuropean Conference on Machine Learning 2019 - Wurzburg University, Wurzburg, Germany
Duur: 16 sep 201920 sep 2019
https://ecmlpkdd2019.org/

Publicatie series

NaamLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11908 LNAI
ISSN van geprinte versie0302-9743
ISSN van elektronische versie1611-3349

Conference

ConferenceEuropean Conference on Machine Learning 2019
Verkorte titelECMLPKDD
Land/RegioGermany
StadWurzburg
Periode16/09/1920/09/19
Internet adres

Bibliografische nota

Pages (from-to) not filled, because this information has not been disclosed by ECML to the authors, and Springer does not publish a table of contents of the proceedings book. The book would need to be bought to get this information.

Vingerafdruk

Duik in de onderzoeksthema's van 'Sample-Efficient Model-Free Reinforcement Learning with Off-Policy Critics'. Samen vormen ze een unieke vingerafdruk.

Citeer dit