Reinforcement Learning; Exploration/Exploitation dilemma,; Formula discovery
Abstract :
[en] We consider the problem of learning high-performance Exploration/Exploitation (E/E)
strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled
is supposed to be drawn from a known probability distribution pM(·). The performance
criterion is the sum of discounted rewards collected by the E/E strategy over an infinite
length trajectory. We propose an approach for solving this problem that works by
considering a rich set of candidate E/E strategies and by looking for the one that gives
the best average performances on MDPs drawn according to pM(·). As candidate E/E
strategies, we consider index-based strategies parametrized by small formulas combining
variables that include the estimated reward function, the number of times each transition
has occurred and the optimal value functions ˆ V and ˆQ of the estimated MDP (obtained
through value iteration). The search for the best formula is formalized as a multi-armed
bandit problem, each arm being associated with a formula. We experimentally compare
the performances of the approach with R-max as well as with -Greedy strategies and
the results are promising.
Disciplines :
Computer science
Author, co-author :
Castronovo, Michaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Language :
English
Title :
Learning for exploration/exploitation in reinforcement learning
Defense date :
June 2012
Number of pages :
51
Institution :
ULiège - Université de Liège
Degree :
Master en sciences informatiques, à finalité approfondie
Promotor :
Ernst, Damien ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Louveaux, Quentin ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Wehenkel, Louis ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Geurts, Pierre ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.