Master’s dissertation (Dissertations and theses)
Learning for exploration/exploitation in reinforcement learning
Castronovo, Michaël
2012
 

Files


Full Text
main.pdf
Author postprint (614.37 kB)
Download
Annexes
main.pdf
Publisher postprint (188.19 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Reinforcement Learning; Exploration/Exploitation dilemma,; Formula discovery
Abstract :
[en] We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known probability distribution pM(·). The performance criterion is the sum of discounted rewards collected by the E/E strategy over an infinite length trajectory. We propose an approach for solving this problem that works by considering a rich set of candidate E/E strategies and by looking for the one that gives the best average performances on MDPs drawn according to pM(·). As candidate E/E strategies, we consider index-based strategies parametrized by small formulas combining variables that include the estimated reward function, the number of times each transition has occurred and the optimal value functions ˆ V and ˆQ of the estimated MDP (obtained through value iteration). The search for the best formula is formalized as a multi-armed bandit problem, each arm being associated with a formula. We experimentally compare the performances of the approach with R-max as well as with -Greedy strategies and the results are promising.
Disciplines :
Computer science
Author, co-author :
Castronovo, Michaël ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Language :
English
Title :
Learning for exploration/exploitation in reinforcement learning
Defense date :
June 2012
Number of pages :
51
Institution :
ULiège - Université de Liège
Degree :
Master en sciences informatiques, à finalité approfondie
Promotor :
Ernst, Damien  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Louveaux, Quentin  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Wehenkel, Louis  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Geurts, Pierre  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Available on ORBi :
since 09 October 2012

Statistics


Number of views
160 (18 by ULiège)
Number of downloads
228 (15 by ULiège)

Bibliography


Similar publications



Contact ORBi