Multivariate Normal Distribution Based Multi-Armed Bandit Pareto Algorithm

Saba Yahyaa, Madalina Drugan, Bernard Manderick

Research output: Chapter in Book/Report/Conference proceedingMeeting abstract (Book)Research

Abstract

In the stochastic multivariate multi-armed bandit,
arms generate a vector of stochastic normal rewards, one per objective, instead of a single scalar reward.
As a result, there is not only one optimal arm,
but there is a set of optimal arms (Pareto front)
using
Pareto dominance relation.
The goal of an agent is to trade-off between exploration and exploitation.
Exploration means finding the Pareto front and exploitation means
selecting fairly or evenly the optimal arms.
We propose annealing-Pareto algorithm that
trades-off between exploration and exploitation
by using a decaying parameter $\epsilon_{t}$ in combination with
Pareto dominance relation.
We compare experimentally Pareto-KG, Pareto-UCB1 and annealing-Pareto
on multi-objective normal distributions and we conclude that the annealing-Pareto is the best performing algorithm.
Original languageEnglish
Title of host publicationthe 7th European conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (EMCL/PKDD), PhD track
Publication statusPublished - 2014

Publication series

Namethe 7th European conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (EMCL/PKDD), PhD track

Keywords

  • Multi armed bandit problem
  • multi objective optimization
  • annealing algorithm
  • exploration/exploitation

Fingerprint

Dive into the research topics of 'Multivariate Normal Distribution Based Multi-Armed Bandit Pareto Algorithm'. Together they form a unique fingerprint.

Cite this