Abstract
In the stochastic multivariate multi-armed bandit,
arms generate a vector of stochastic normal rewards, one per objective, instead of a single scalar reward.
As a result, there is not only one optimal arm,
but there is a set of optimal arms (Pareto front)
using
Pareto dominance relation.
The goal of an agent is to trade-off between exploration and exploitation.
Exploration means finding the Pareto front and exploitation means
selecting fairly or evenly the optimal arms.
We propose annealing-Pareto algorithm that
trades-off between exploration and exploitation
by using a decaying parameter $\epsilon_{t}$ in combination with
Pareto dominance relation.
We compare experimentally Pareto-KG, Pareto-UCB1 and annealing-Pareto
on multi-objective normal distributions and we conclude that the annealing-Pareto is the best performing algorithm.
arms generate a vector of stochastic normal rewards, one per objective, instead of a single scalar reward.
As a result, there is not only one optimal arm,
but there is a set of optimal arms (Pareto front)
using
Pareto dominance relation.
The goal of an agent is to trade-off between exploration and exploitation.
Exploration means finding the Pareto front and exploitation means
selecting fairly or evenly the optimal arms.
We propose annealing-Pareto algorithm that
trades-off between exploration and exploitation
by using a decaying parameter $\epsilon_{t}$ in combination with
Pareto dominance relation.
We compare experimentally Pareto-KG, Pareto-UCB1 and annealing-Pareto
on multi-objective normal distributions and we conclude that the annealing-Pareto is the best performing algorithm.
Original language | English |
---|---|
Title of host publication | the 7th European conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (EMCL/PKDD), PhD track |
Publication status | Published - 2014 |
Publication series
Name | the 7th European conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (EMCL/PKDD), PhD track |
---|
Keywords
- Multi armed bandit problem
- multi objective optimization
- annealing algorithm
- exploration/exploitation