Projecten per jaar
Samenvatting
Partially Observable Markov Decision Processes (POMDPs) are useful tools to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Keeping a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is also intractable. Current state-of-the-art algorithms use Recurrent Neural Networks (RNNs) to compress the observation-action history aiming to learn a sufficient statistic, but they lack guarantees of success and can lead to suboptimal policies. To overcome this, we propose the Wasserstein-Belief-Updater (WBU), an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function.
Originele taal-2 | English |
---|---|
Status | Published - 18 nov 2024 |
Evenement | BNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning - Jaarbeurs Supernova, Utrecht, Netherlands Duur: 18 nov 2024 → 20 nov 2024 Congresnummer: 36 https://bnaic2024.sites.uu.nl/ |
Conference
Conference | BNAIC/BeNeLearn 2024: Joint International Scientific Conferences on AI and Machine Learning |
---|---|
Verkorte titel | BNAIC/BeNeLearn 2024 |
Land/Regio | Netherlands |
Stad | Utrecht |
Periode | 18/11/24 → 20/11/24 |
Internet adres |
Projecten
- 2 Actief
-
VLAAI1: Vlaams Artificiële Intelligentie Onderzoeksprogramma (VAIOP) – tweede cyclus
1/01/24 → 31/12/28
Project: Toegepast
-
iBOF/21/027: DESCARTES - infectieziekten economie en artificiële intelligentie met garanties
Nowe, A., Hens, N. & Beutels, P.
1/01/21 → 31/12/26
Project: Fundamenteel
-
Activating formal verification of deep reinforcement learning policies by model checking bisimilar latent space models
Delgrange, F., 2024, VUB Press. 348 blz.Onderzoeksoutput: PhD Thesis
Open AccessBestand -
The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models
Avalos, R., Delgrange, F., Nowé, A., Pérez, G. A. & Roijers, D. M., 7 mei 2024, The Twelfth International Conference on Learning Representations: ICLR 2024. OpenReview.net, 9 blz.Onderzoeksoutput: Conference paper
Open AccessBestand -
The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models
Avalos, R., Delgrange, F., Nowé, A., Pérez, G. A. & Roijers, D. M., 29 mei 2023, In: Proc. of the Adaptive and Learning Agents Workshop (ALA 2023). blz. 1-21 21 blz., 52.Onderzoeksoutput: Conference paper
Open AccessBestand