Article (Scientific journals)
Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search
Jung, Tobias; Wehenkel, Louis; Ernst, Damien et al.
2014In International Journal of Adaptive Control and Signal Processing, 28 (3-5), p. 255-289
Peer Reviewed verified by ORBi
 

Files


Full Text
acs2387.pdf
Publisher postprint (1.96 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Reinforcement learning; direct policy search; look-ahead tree search
Abstract :
[en] Direct policy search (DPS) and look-ahead tree (LT) policies are two popular techniques for solving difficult sequential decision-making problems. They both are simple to implement, widely applicable without making strong assumptions on the structure of the problem, and capable of producing high performance control policies. However, computationally both of them are, each in their own way, very expensive. DPS can require huge offline resources (effort required to obtain the policy) to first select an appropriate space of parameterized policies that works well for the targeted problem, and then to determine the best values of the parameters via global optimization. LT policies do not require any offline resources; however, they typically require huge online resources (effort required to calculate the best decision at each step) in order to grow trees of sufficient depth. In this paper, we propose optimized look-ahead trees (OLT), a model-based policy learning scheme that lies at the intersection of DPS and LT. In OLT, the control policy is represented indirectly through an algorithm that at each decision step develops, as in LT using a model of the dynamics, a small look-ahead tree until a prespecified online budget is exhausted. Unlike LT, the development of the tree is not driven by a generic heuristic; rather, the heuristic is optimized for the target problem and implemented as a parameterized node scoring function learned offline via DPS. We experimentally compare OLT with pure DPS and pure LT variants on optimal control benchmark domains. The results show that the LT-based representation is a versatile way of compactly representing policies in a DPS scheme (which results in OLT being easier to tune and having lower offline complexity than pure DPS); while at the same time, DPS helps to significantly reduce the size of the look-ahead trees that are required to take high-quality decisions (which results in OLT having lower online complexity than pure LT). Moreover, OLT produces overall better performing policies than pure DPS and pure LT and also results in policies that are robust with respect to perturbations of the initial conditions.
Disciplines :
Computer science
Author, co-author :
Jung, Tobias ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Réseaux informatiques
Wehenkel, Louis  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Ernst, Damien  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Maes, Francis ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Optimized look-ahead tree policies: a bridge between look-ahead tree policies and direct policy search
Publication date :
March 2014
Journal title :
International Journal of Adaptive Control and Signal Processing
ISSN :
0890-6327
eISSN :
1099-1115
Publisher :
Wiley
Volume :
28
Issue :
3-5
Pages :
255-289
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBi :
since 27 March 2012

Statistics


Number of views
173 (38 by ULiège)
Number of downloads
167 (16 by ULiège)

Scopus citations®
 
3
Scopus citations®
without self-citations
2
OpenCitations
 
3

Bibliography


Similar publications



Contact ORBi