This paper together with the three papers "Model-free Monte Carlo-like policy evaluation", "Inferring bounds on the performance of a control policy from a sample of trajectories" and "A cautious approach to generalization in reinforcement learning" represent a body of work in batch-mode RL which is based on the rebuilding of trajectories. This file is a presentation of this body of work.
[en] We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for computing bounds on the return of control policies from a set of trajectories.
Disciplines :
Computer science
Author, co-author :
Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Murphy, Susan
Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Generating informative trajectories by using bounds on the return of control policies
Publication date :
May 2010
Event name :
Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010)
Event place :
Chia Laguna, Sardinia, Italy
Event date :
May 16, 2010
Audience :
International
Main work title :
Proceedings of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010)
Peer reviewed :
Peer reviewed
Funders :
FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture F.R.S.-FNRS - Fonds de la Recherche Scientifique