Paper published in a book (Scientific congresses and symposiums)
Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison
Castronovo, Michaël; Ernst, Damien; Fonteneau, Raphaël
2014In Proceedings of the 23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014)
Peer reviewed
 

Files


Full Text
BAMCP_vs_OPPS_v3.pdf
Author postprint (393.06 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Reinforcement Learning
Abstract :
[en] This paper addresses the problem of decision making in unknown finite Markov decision processes (MDPs). The uncertainty about the MDPs is modeled using a prior distribution over a set of candidate MDPs. The performance criterion is the expected sum of discounted rewards collected over an infinite length trajectory. Time constraints are defined as follows: (i) an off-line phase with a given time budget can be used to exploit the prior distribution and (ii) at every time step of the on-line phase, decisions have to be computed within a given time budget. In this setting, we compare two decision-making strategies: OPPS, a recently proposed meta-learning scheme which mainly exploits the off-line phase to perform policy search and BAMCP, a state-of-the-art model-based Bayesian reinforcement learning algorithm, which mainly exploits the on-line time budget. We empirically compare these approaches in a real Bayesian setting by computing their performances over a large set of problems. To the best of our knowledge, it is the first time that this is done in the reinforcement learning literature. Several settings are considered by varying the prior distribution and the distribution from which test problems are drawn. The main finding of these experiments is that there may be a significant benefit of having an off-line prior-based optimization phase in the case of informative and accurate priors, especially when on-line time constraints are tight.
Disciplines :
Computer science
Author, co-author :
Castronovo, Michaël ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Ernst, Damien  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Fonteneau, Raphaël ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison
Alternative titles :
[fr] Apprentissage par renforcement bayésien versus recherche directe de politique hors-ligne en utilisant une distribution a priori: comparaison empirique
Publication date :
June 2014
Event name :
23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014)
Event place :
Brussels, Belgium
Event date :
June 2014
Audience :
International
Main work title :
Proceedings of the 23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014)
Peer reviewed :
Peer reviewed
Tags :
CÉCI : Consortium des Équipements de Calcul Intensif
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]
CÉCI - Consortium des Équipements de Calcul Intensif [BE]
Available on ORBi :
since 07 May 2014

Statistics


Number of views
356 (87 by ULiège)
Number of downloads
218 (29 by ULiège)

Bibliography


Similar publications



Contact ORBi