Samenvatting

Reinforcement Learning usually does not scale up well to large problems. It typically takes a Reinforcement Learning agent many trials until it can reach a satisfying policy. A main contributing factor to this problem is the fact that Reinforcement Learning is often used for learning exclusively by means of trial and error. There has been much work that addresses incorporating domain knowledge in Reinforcement Learning to allow more efficient learning. Reward shaping is a well-established method to incorporate domain knowledge in Reinforcement Learning by providing the learning agent with a supplementary reward. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic formulas. Linear Temporal Logic in our work serves as a rich, yet compact, language that allows the user to express the domain knowledge with minimum effort. Linear Temporal Logic is also rather easy to be expressed in natural language which makes it easier for non-expert users. We use the flag collection domain to demonstrate empirically the increase in both the convergence speed and the quality of the learned policy despite the minimum domain knowledge provided.
Originele taal-2English
TitelProceedings of the Adaptive and Learning Agents Workshop 2021 (ALA2021) at AAMAS
Aantal pagina's6
StatusPublished - 27 apr 2021
EvenementAdaptive and Learning Agents Workshop 2021 -
Duur: 3 mei 20214 mei 2021

Workshop

WorkshopAdaptive and Learning Agents Workshop 2021
Verkorte titelALA2021
Periode3/05/214/05/21

Vingerafdruk

Duik in de onderzoeksthema's van 'LTLf-based Reward Shaping for Reinforcement Learning'. Samen vormen ze een unieke vingerafdruk.

Citeer dit