Contribution to collective works (Parts of books)
Meta-learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case
Maes, Francis; Wehenkel, Louis; Ernst, Damien
2013In Filipe, Joaquim; Fred, Ana (Eds.) Agents and Artificial Intelligence: 4th International Conference, ICAART 2012, Vilamoura, Portugal, February 6-8, 2012. Revised Selected Papers
Peer reviewed
 

Files


Full Text
maes-icaart-long.pdf
Publisher postprint (348.86 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Exploration-exploitation dilemma; Prior knowledge; Multi-armed bandit problems; Reinforcement learning
Abstract :
[en] The exploration/exploitation (E/E) dilemma arises naturally in many subfields of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this field focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the specific class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy. To address a specific class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to find a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution. We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed “Bernoulli” bandit problems and various playing budgets, show that the metalearnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-T UNED, UCB-V, KL-UCB and epsilon-GREEDY); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution.
Disciplines :
Computer science
Author, co-author :
Maes, Francis
Wehenkel, Louis  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Ernst, Damien  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Language :
English
Title :
Meta-learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case
Publication date :
2013
Main work title :
Agents and Artificial Intelligence: 4th International Conference, ICAART 2012, Vilamoura, Portugal, February 6-8, 2012. Revised Selected Papers
Editor :
Filipe, Joaquim
Fred, Ana
Publisher :
Springer
ISBN/EAN :
978-3-642-36907-0
Collection name :
Communications in Computer and Information Science, Volume 358
Pages :
110-115
Peer reviewed :
Peer reviewed
Available on ORBi :
since 14 May 2013

Statistics


Number of views
89 (8 by ULiège)
Number of downloads
198 (3 by ULiège)

Bibliography


Similar publications



Contact ORBi