[en] We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each system transition gathers a state, the action taken while being in this state, the immediate reward observed and the next state reached. In such a context, we propose a new model learning--type reinforcement learning (RL) algorithm in batch mode, finite-time and deterministic setting. The algorithm, named Voronoi reinforcement learning (VRL), approximates from a sample of system transitions the system dynamics and the reward function of the optimal control problem using piecewise constant functions on a Voronoi--like partition of the state-action space.
Disciplines :
Computer science
Author, co-author :
Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Language :
English
Title :
Voronoi model learning for batch mode reinforcement learning