Combining Feature Extraction Pipelines for Deep Reinforcement Learning

Scriptie/Masterproef: Master's Thesis

Samenvatting

Reinforcement Learning is the research field within Machine Learning that develops algorithms to let agents learn tasks from experience. This is done by interacting with the environment in which they are supposed to operate.The environment provides a feedback response by rewarding the agent. This allows the agent to reflect on the quality of its decisions and incrementally learn how to act properly in the environment. In order to allow Reinforcement Learning agents to learn on larger environments, regression techniques from supervised learning have been introduced in this context. The techniques attempt to approximate a function which determines the value of an action in the current state of the environment. Such a function is learnt out of features that describe the current state of the environment. These are often hand-coded by human experts. General methods also exist to transform states into feature sets, but a parameter fine-tuning is often required which demands human effort.
Deep Learning has come up as a popular research field in supervised Machine Learning. It considers the application of large Artificial Neural Networks. The main goal is to learn hierarchical layers of abstractions of the given input. This allows computers to recognise abstract and complex patterns in the data. The layers learn features of the given data in an automated fashion. Applying Deep Learning in Reinforcement Learning has received the term Deep Reinforcement Learning. Deep neural networks learn a value function for the environment or the layers provide features on which function approximation can be performed.
For this thesis, we applied Deep Reinforcement Learning in the Arcade Learning Environment. The environment has two types of state representation: frames and RAM. Previous literature only considers learning on one of these types. Our research question is whether combining feature pipelines trained on these state representations improves learning and is powerful enough to replace the existing hand-coded features. Deep autoencoders were trained to provide features for one of the two state representations and combined to train agents using the Sarsa(λ) algorithm. In most of the cases, one of the hand-coded features could be outperformed. The other cases were able to match the performance of this hand-coded feature.
Datum prijsjun 2016
Originele taalEnglish
Prijsuitreikende instantie
  • Vrije Universiteit Brussel
BegeleiderPeter Vrancx (Promotor) & Ann Nowe (Co-promotor)

Citeer dit

'