Reinforcement Learning (RL) enables artificial agents to learn through direct interaction with the environment. However, it usually does not scale up well to large problems due to its sampling inefficiency. Reward Shaping is a well-established approach that allows for more efficient learning by incorporating domain knowledge in RL agents via supplementary rewards. In this work we propose a novel methodology that automatically generates reward shaping functions from user-provided Linear Temporal Logic on finite traces (LTLf) formulas. LTLf in our work serves as a rich language that allows the user to communicate domain knowledge to the learning agent. In both single and multi-agent settings, we demonstrate that our approach performs at least as well as the baseline approach while providing essential advantages in terms of flexibility and ease of use. We elaborate on some of these advantages empirically by demonstrating that our approach can handle domain knowledge with different levels of accuracy, and provides the user with the flexibility to express aspects of uncertainty in the provided advice.
Originele taal-2English
Pagina's (van-tot)1-17
Aantal pagina's17
TijdschriftNeural Computing & Applications
Vroegere onlinedatum7 jun 2022
StatusPublished - 7 jun 2022


Duik in de onderzoeksthema's van 'A Framework for Flexibly Guiding Learning Agents'. Samen vormen ze een unieke vingerafdruk.

Citeer dit