The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Research output: Chapter in Book/Report/Conference proceedingConference paper

9 Downloads (Pure)

Abstract

Partially Observable Markov Decision Processes (POMDPs) are used to model environments where the full state cannot be perceived by an agent. As such the agent needs to reason taking into account the past observations and actions. However, simply remembering the full history is generally intractable due to the exponential growth in the history space. Maintaining a probability distribution that models the belief over what the true state is can be used as a sufficient statistic of the history, but its computation requires access to the model of the environment and is often intractable. While SOTA algorithms use Recurrent Neural Networks to compress the observation-action history aiming to learn a sufficient statistic, they lack guarantees of success and can lead to sub-optimal policies. To overcome this, we propose the Wasserstein Belief Updater, an RL algorithm that learns a latent model of the POMDP and an approximation of the belief update. Our approach comes with theoretical guarantees on the quality of our approximation ensuring that our outputted beliefs allow for learning the optimal value function.
Original languageEnglish
Title of host publicationThe Twelfth International Conference on Learning Representations
Subtitle of host publicationICLR 2024
PublisherOpenReview.net
Number of pages9
Publication statusPublished - 7 May 2024
EventThe Twelfth International Conference on Learning Representations - Messe Wien Exhibition and Congress Center, Vienna, Austria
Duration: 7 May 202411 May 2024
Conference number: 12
https://iclr.cc

Conference

ConferenceThe Twelfth International Conference on Learning Representations
Abbreviated titleICLR 2024
Country/TerritoryAustria
CityVienna
Period7/05/2411/05/24
Internet address

Keywords

  • Reinforcement Learning
  • POMDP
  • Model-based
  • Representation Learning

Fingerprint

Dive into the research topics of 'The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models'. Together they form a unique fingerprint.

Cite this