Reinforcement Learning (RL) enables artificial agents to learn through direct interaction with the environment. However, it usually does not scale up well to large problems due to its sampling inefficiency. Reward Shaping is a well-established approach that allows for more efficient learning by incorporating domain knowledge in RL agents via supplementary rewards. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic on finite traces (LTLf) formulas. LTLf in our work serves as a rich language that allows the user to communicate domain knowledge to the learning agent. In both single and multi-agent settings, we demonstrate that our approach performs at least as well as the baseline approach while providing essential advantages in terms of flexibility and ease of use. We elaborate on some of these advantages empirically by demonstrating that our approach can handle domain knowledge with different levels of accuracy, and provides the user with the flexibility to express aspects of uncertainty in the provided advice.
Original languageEnglish
Article number1
Pages (from-to)1-17
Number of pages17
JournalNeural Computing & Applications
Early online date7 Jun 2022
Publication statusPublished - 7 Jun 2022


  • Reinforcement Learning
  • Reward Shaping
  • Linear Temporal Logic on finite traces
  • Multi-agent Systems


Dive into the research topics of 'A Framework for Flexibly Guiding Learning Agents'. Together they form a unique fingerprint.

Cite this