FWO Postdoc mandaat Diederik Roijers
Hoofd van de onderzoekscel: Prof. Dr. Ann Nowé
In this project, we develop new methods for decision support in problems with multiple objectives. Specifically, we focus on the case where the dynamics of the problem is stochastic and unknown. For example, imagine a highway where we control the maximum speed and the opening and closing of "rush hour lanes". In this problem, we aim to maximize throughput, while minimizing congestion, environmental impact and noise. Because the precise effects of each control action are initially unknown, we need to learn suitable control policies through interaction with the system. Furthermore, because there are multiple objectives, there is not a single optimal policy. Therefore, we aim to provide decision support by presenting alternative policies and their value estimates w.r.t. the different objectives to the user.
We put the user at the center of our research. We take a utility-based approach and model what we know about the possible preferences, i.e., the utility function, of the user. Given this (typically incomplete) information, we aim to provide guarantees w.r.t. the loss of utility for this user compared to the theoretical but unknown maximum. We propose two modes to do this: in the first, we learn a set of alternative policies by interacting with the stochastic environment, and present this set to the user. In the second, we also allow interaction with the user during learning, in order to elicit her preferences and improve the efficiency and output of the learning process.