[en] This work proposes an approach based on reward shaping
techniques in a reinforcement learning setting to approximate the opti-
mal decision-making process (also called the optimal policy) in a desired
task with a limited amount of data. We extract prior information from
an existing family of policies have been used as a heuristic to help the
construction of the new one under this challenging condition. We use this
approach to study the relationship between the similarity of two tasks
and the minimal amount of data needed to compute a near-optimal pol-
icy for the second one using the prior information of the existing policy.
Preliminary results show that for the least similar existing task consid-
ered compared to the desired one, only 10% of the dataset was needed
to compute the corresponding near-optimal policy.
Disciplines :
Computer science
Author, co-author :
Aittahar, Samy ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Sootla, Aivar
Other collaborator :
Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Language :
English
Title :
Policy transfer using Value Function as Prior Information