[en] We consider the problem of decision making in the context of unknown Markov decision processes with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic posterior sampling strategy based on the maximization of state-action value functions of MDPs sampled from the posterior. First experiments are promising.
Disciplines :
Computer science
Author, co-author :
Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Korda, Nathan; University of Oxford, England
Munos, Rémi; Inria Lille - Nord Europe
Language :
English
Title :
An Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning
Publication date :
2013
Event name :
NIPS 2013 Workshop on Bayesian Optimization (BayesOpt2013)
Event date :
10 décembre 2013
Main work title :
NIPS 2013 Workshop on Bayesian Optimization (BayesOpt2013)