Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets

Onderzoeksoutput: Conference paperResearch

9 Citaten (Scopus)
154 Downloads (Pure)


Many real-world reinforcement learning problems have a hierarchical nature, and often exhibit some degree of partial observability. While hierarchy and partial observability are usually tackled separately (for instance by combining recurrent neural networks and options), we show that addressing both problems simultaneously is simpler and more efficient in many cases. More specifically, we make the initiation set of options conditional on the previously-executed option, and show that options with such Option-Observation Initiation Sets (OOIs) are at least as expressive as Finite State Controllers (FSCs), a state-of-the-art approach for learning in POMDPs. OOIs are easy to design based on an intuitive description of the task, lead to explainable policies and keep the top-level and option policies memoryless. Our experiments show that OOIs allow agents to learn optimal policies in challenging POMDPs, while being much more sample-efficient than a recurrent neural network over options.
Originele taal-2English
TitelProceedings of the Thirty-Second AAAI Conference on Artificial Intelligence
UitgeverijAAAI Press
Aantal pagina's8
ISBN van elektronische versie9781577358008
ISBN van geprinte versie978-1-57735-800-8
StatusPublished - 4 feb 2018
EvenementThirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018) - Hilton Riverside Hotel, New Orleans, United States
Duur: 2 feb 20187 feb 2018
Congresnummer: 32

Publicatie series

NaamAAAI Conference on Artificial Intelligence
UitgeverijAssociation for the Advancement of Artificial Intelligence
ISSN van geprinte versie2159-5399
ISSN van elektronische versie2374-3468


ConferenceThirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018)
Verkorte titelAAAI
Land/RegioUnited States
StadNew Orleans
Internet adres


Duik in de onderzoeksthema's van 'Reinforcement Learning in POMDPs with Memoryless Options and Option-Observation Initiation Sets'. Samen vormen ze een unieke vingerafdruk.

Citeer dit