Reference : Learning for exploration/exploitation in reinforcement learning
Dissertations and theses : Master's dissertation
Engineering, computing & technology : Computer science
Learning for exploration/exploitation in reinforcement learning
Castronovo, Michaël mailto [Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids >]
Université de Liège, ​Liège, ​​Belgique
Master en sciences informatiques, à finalité approfondie
Ernst, Damien mailto
Louveaux, Quentin mailto
Wehenkel, Louis mailto
Geurts, Pierre mailto
[en] Reinforcement Learning ; Exploration/Exploitation dilemma, ; Formula discovery
[en] We consider the problem of learning high-performance Exploration/Exploitation (E/E)
strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled
is supposed to be drawn from a known probability distribution pM(·). The performance
criterion is the sum of discounted rewards collected by the E/E strategy over an infinite
length trajectory. We propose an approach for solving this problem that works by
considering a rich set of candidate E/E strategies and by looking for the one that gives
the best average performances on MDPs drawn according to pM(·). As candidate E/E
strategies, we consider index-based strategies parametrized by small formulas combining
variables that include the estimated reward function, the number of times each transition
has occurred and the optimal value functions ˆ V and ˆQ of the estimated MDP (obtained
through value iteration). The search for the best formula is formalized as a multi-armed
bandit problem, each arm being associated with a formula. We experimentally compare
the performances of the approach with R-max as well as with -Greedy strategies and
the results are promising.
Researchers ; Students

File(s) associated to this reference

Fulltext file(s):

Open access
main.pdfAuthor postprint599.97 kBView/Open

Additional material(s):

File Commentary Size Access
Open access
main.pdf183.78 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.