35 Downloads (Pure)

Abstract

Reinforcement Learning usually does not scale up well to large problems. It typically takes a Reinforcement Learning agent many trials until it can reach a satisfying policy. A main contributing factor to this problem is the fact that Reinforcement Learning is often used for learning exclusively by means of trial and error. There has been much work that addresses incorporating domain knowledge in Reinforcement Learning to allow more efficient learning. Reward shaping is a well-established method to incorporate domain knowledge in Reinforcement Learning by providing the learning agent with a supplementary reward. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic formulas. Linear Temporal Logic in our work serves as a rich, yet compact, language that allows the user to express the domain knowledge with minimum effort. Linear Temporal Logic is also rather easy to be expressed in natural language which makes it easier for non-expert users. We use the flag collection domain to demonstrate empirically the increase in both the convergence speed and the quality of the learned policy despite the minimum domain knowledge provided.
Original languageEnglish
Title of host publicationProceedings of the Adaptive and Learning Agents Workshop 2021 (ALA2021) at AAMAS
Number of pages6
Publication statusPublished - 27 Apr 2021
EventAdaptive and Learning Agents Workshop 2021 -
Duration: 3 May 20214 May 2021

Workshop

WorkshopAdaptive and Learning Agents Workshop 2021
Abbreviated titleALA2021
Period3/05/214/05/21

Keywords

  • Reinforcement Learning
  • Reward Shaping
  • Linear Temporal Logic on finite traces

Fingerprint

Dive into the research topics of 'LTLf-based Reward Shaping for Reinforcement Learning'. Together they form a unique fingerprint.

Cite this