Synthesising Reinforcement Learning Policies through Set-Valued Inductive Rule Learning

Onderzoeksoutput: Unpublished paper

84 Downloads (Pure)


Today's Deep Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, based on the CN2 rule mining algorithm, that distills the deep policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a mapping from states to individual actions, but also produces extra meta-information, such as lists of visited states, or action probabilities. This meta-information can, for example, indicate whether more than one action is near-optimal for a certain state. We exploit knowledge about these equally-good actions to distill the policy into fewer rules, which contributes to interpretability, while ensuring that the performance of the distilled policy still matches the original policy. This ensures that we don't provide an explanation for a degenerate or over-simplified policy. We demonstrate the applicability of our algorithm to the Mario AI benchmark, a complex task that requires modern deep reinforcement learning algorithms. The explanations we produce capture the learned policy in only a few rules, and can be further refined and tailored by the user with a two-step process that we introduce in this paper.
Originele taal-2English
Aantal pagina's16
StatusPublished - 4 sep 2020
Evenement1st TAILOR Workshop at ECAI 2020 -
Duur: 4 sep 20205 sep 2020


Workshop1st TAILOR Workshop at ECAI 2020
Internet adres


Duik in de onderzoeksthema's van 'Synthesising Reinforcement Learning Policies through Set-Valued Inductive Rule Learning'. Samen vormen ze een unieke vingerafdruk.

Citeer dit