Paper published in a book (Scientific congresses and symposiums)
Learning exploration/exploitation strategies for single trajectory reinforcement learning
Castronovo, Michaël; Maes, Francis; Fonteneau, Raphaël et al.
2012In Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012)
Peer reviewed
 

Files


Full Text
castronovo12a.pdf
Publisher postprint (270.86 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
reinforcement learning; Exploration/Exploitation dilemma; formula discovery
Abstract :
[en] We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM( ). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an in finite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM( ). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions V and Q of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with e-Greedy strategies and the results are promising.
Disciplines :
Computer science
Author, co-author :
Castronovo, Michaël ;  Université de Liège - ULiège > 2e an. master sc. infor., fin. appr.
Maes, Francis ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Fonteneau, Raphaël ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Ernst, Damien  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Language :
English
Title :
Learning exploration/exploitation strategies for single trajectory reinforcement learning
Publication date :
2012
Event name :
10th European Workshop on Reinforcement Learning (EWRL 2012)
Event place :
Edinburgh, United Kingdom
Event date :
June 30-July 1, 2012
Audience :
International
Main work title :
Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012)
Collection name :
JMLR Workshop and Conference Proceedings 24
Pages :
1-9
Peer reviewed :
Peer reviewed
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Available on ORBi :
since 26 July 2012

Statistics


Number of views
354 (36 by ULiège)
Number of downloads
221 (16 by ULiège)

Bibliography


Similar publications



Contact ORBi