Meta-learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

Maes, Francis; Wehenkel, Louis; Ernst, Damien

Contribution to collective works (Parts of books)

Maes, Francis; Wehenkel, Louis; Ernst, Damien

2013 • In Filipe, Joaquim; Fred, Ana (Eds.) Agents and Artificial Intelligence: 4th International Conference, ICAART 2012, Vilamoura, Portugal, February 6-8, 2012. Revised Selected Papers

Peer reviewed

Permalink
https://hdl.handle.net/2268/148453

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

maes-icaart-long.pdf

Publisher postprint (348.86 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Exploration-exploitation dilemma; Prior knowledge; Multi-armed bandit problems; Reinforcement learning

Abstract :

[en] The exploration/exploitation (E/E) dilemma arises naturally in many subﬁelds of Science. Multi-armed bandit problems formalize this dilemma in its canonical form. Most current research in this ﬁeld focuses on generic solutions that can be applied to a wide range of problems. However, in practice, it is often the case that a form of prior information is available about the speciﬁc class of target problems. Prior knowledge is rarely used in current solutions due to the lack of a systematic approach to incorporate it into the E/E strategy. To address a speciﬁc class of E/E problems, we propose to proceed in three steps: (i) model prior knowledge in the form of a probability distribution over the target class of E/E problems; (ii) choose a large hypothesis space of candidate E/E strategies; and (iii), solve an optimization problem to ﬁnd a candidate E/E strategy of maximal average performance over a sample of problems drawn from the prior distribution. We illustrate this meta-learning approach with two different hypothesis spaces: one where E/E strategies are numerically parameterized and another where E/E strategies are represented as small symbolic formulas. We propose appropriate optimization algorithms for both cases. Our experiments, with two-armed “Bernoulli” bandit problems and various playing budgets, show that the metalearnt E/E strategies outperform generic strategies of the literature (UCB1, UCB1-T UNED, UCB-V, KL-UCB and epsilon-GREEDY); they also evaluate the robustness of the learnt E/E strategies, by tests carried out on arms whose rewards follow a truncated Gaussian distribution.

Disciplines :

Computer science

Author, co-author :

Maes, Francis

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Language :

English

Title :

Meta-learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

Publication date :

2013

Main work title :

Agents and Artificial Intelligence: 4th International Conference, ICAART 2012, Vilamoura, Portugal, February 6-8, 2012. Revised Selected Papers

Editor :

Filipe, Joaquim

Fred, Ana

Publisher :

Springer

ISBN/EAN :

978-3-642-36907-0

Collection name :

Communications in Computer and Information Science, Volume 358

Pages :

110-115

Peer reviewed :

Peer reviewed

Available on ORBi :

since 14 May 2013

Statistics

Number of views

104 (8 by ULiège)

Number of downloads

213 (4 by ULiège)

More statistics

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website