Lipschitz robust control from off-policy trajectories

Fonteneau, Raphaël; Ernst, Damien; Boigelot, Bernard; Louveaux, Quentin

Download

Paper published in a book (Scientific congresses and symposiums)

Lipschitz robust control from off-policy trajectories

Fonteneau, Raphaël; Ernst, Damien; Boigelot, Bernard et al.

2014 • In Proceedings of the 53rd IEEE Conference on Decision and Control (IEEE CDC 2014)

Peer reviewed

Permalink
https://hdl.handle.net/2268/172988

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

CDC_SIAM.pdf

Author preprint (358.35 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] We study the minmax optimization problem introduced in [Fonteneau et al. (2011), ``Towards min max reinforcement learning'', Springer CCIS, vol. 129, pp. 61-77] for computing control policies for batch mode reinforcement learning in a deterministic setting with fixed, finite optimization horizon. First, we state that the $\min$ part of this problem is NP-hard. We then provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, can also be solved in polynomial time. We theoretically show that both relaxation schemes provide better results than those given in [Fonteneau et al. (2011)]

Disciplines :

Computer science

Author, co-author :

Fonteneau, Raphaël ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Boigelot, Bernard ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Informatique

Louveaux, Quentin ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Système et modélisation : Optimisation discrète

Language :

English

Title :

Lipschitz robust control from off-policy trajectories

Publication date :

2014

Event name :

53rd IEEE Conference on Decision and Control (IEEE CDC 2014)

Event place :

Los Angeles, United States

Event date :

December 15-17, 2014

Audience :

International

Main work title :

Proceedings of the 53rd IEEE Conference on Decision and Control (IEEE CDC 2014)

Peer review/Selection committee :

Peer reviewed

Available on ORBi :

since 14 October 2014

Statistics

Number of views

190 (15 by ULiège)

Number of downloads

332 (13 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

R. Sutton and A. Barto, Reinforcement Learning. MIT Press, 1998.
J. Ingersoll, Theory of Financial Decision Making. Rowman and Littlefield Publishers, Inc., 1987.
S. Murphy, "Optimal dynamic treatment regimes," Journal of the Royal Statistical Society, Series B, Vol. 65 (2), pp. 331-366, 2003.
-, "An experimental design for the development of adaptive treatment strategies," Statistics in Medicine, Vol. 24, pp. 1455-1481, 2005.
M. Riedmiller, "Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method," in Proceedings of the Sixteenth European Conference on Machine Learning (ECML 2005), Porto, Portugal, 2005, pp. 317-328.
S. Bradtke and A. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, Vol. 22, pp. 33-57, 1996.
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," Journal of Machine Learning Research, Vol. 6, pp. 503-556, 2005.
M. Lagoudakis and R. Parr, "Least-squares policy iteration," Jounal of Machine Learning Research, Vol. 4, pp. 1107-1149, 2003.
D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Machine Learning, Vol. 49, no. 2-3, pp. 161-178, 2002.
R. Fonteneau, "Contributions to Batch Mode Reinforcement Learning," Ph.D. dissertation, University of Liège, 2011.
D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Athena Scientific, 1996.
L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming using Function Approximators. Taylor & Francis CRC Press, 2010.
R. Fonteneau, S. Murphy, L. Wehenkel, and D. Ernst, "Computing bounds for kernel-based policy evaluation in reinforcement learning," University of Liège, Tech. Rep., 2010.
-, "Towards min max generalization in reinforcement learning," in Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers. Series: Communications in Computer and Information Science (CCIS), Vol. 129. Springer, Heidelberg, 2011, pp. 61-77.
-, "Inferring bounds on the performance of a control policy from a sample of trajectories," in Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (IEEE ADPRL 09), Nashville, TN, USA, 2009.
-, "A cautious approach to generalization in reinforcement learning," in Proceedings of the Second International Conference on Agents and Artificial Intelligence (ICAART 2010), Valencia, Spain, 2010.
A. Conn, N. Gould, and P. Toint, Trust-region Methods. Society for Industrial Mathematics, 2000, vol. 1.
R. Fonteneau, D. Ernst, B. Boigelot, and Q. Louveaux, "Min max generalization for deterministic batch mode reinforcement learning: Relaxation schemes," SIAM Journal on Control and Optimization, Vol. 51, no. 5, pp. 3355-3385, 2013. [Online]. Available: http://epubs.siam.org/doi/abs/10.1137/120867263
E. Delage and S. Mannor, "Percentile optimization for Markov decision processes with parameter uncertainty," Operations Research, Vol. 58, no. 1, pp. 203-213, 2010.
M. L. Littman, "Markov games as a framework for multi-agent reinforcement learning," in Proceedings of the Eleventh International Conference on Machine Learning (ICML 1994), New Brunswick, NJ, USA, 1994.
M. Rovatous and M. Lagoudakis, "Minimax search and reinforcement learning for adversarial tetris," in Proceedings of the 6th Hellenic Conference on Artificial Intelligence (SETN'10), Athens, Greece, 2010.
M. L. Littman, "A tutorial on partially observable markov decision processes," Journal of Mathematical Psychology, Vol. 53, no. 3, pp. 119-125, 2009, special Issue: Dynamic Decision Making.
S. Koenig, "Minimax real-time heuristic search," Artificial Intelligence, Vol. 129, no. 1-2, pp. 165-197, 2001.
S. Mannor, D. Simester, P. Sun, and J. Tsitsiklis, "Bias and variance in value function estimation," in Proceedings of the Twenty-first International Conference on Machine Learning (ICML 2004), Banff, Alberta, Canada, 2004.
M. Qian and S. Murphy, "Performance guarantees for individualized treatment rules," Department of Statistics, University of Michigan, Tech. Rep. 498, 2009.
C. Paduraru, D. Precup, and J. Pineau, "A framework for computing bounds for the return of a policy," in Ninth European Workshop on Reinforcement Learning (EWRL9), 2011.
L. Hansen and T. Sargent, "Robust Control and Model Uncertainty," American Economic Review, pp. 60-66, 2001.
T. Başar and P. Bernhard, H∞-optimal control and related minimax design problems: a dynamic game approach. Birkhauser, 1995, vol. 5.
E. Camacho and C. Bordons, Model Predictive Control. Springer, 2004.
D. Ernst, M. Glavic, F. Capitanescu, and L. Wehenkel, "Reinforcement learning versus model predictive control: a comparison on a power system problem," IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, Vol. 39, pp. 517-529, 2009.
P. Scokaert and D. Mayne, "Min-max feedback model predictive control for constrained linear systems," IEEE Transactions on Automatic Control, Vol. 43, no. 8, pp. 1136-1142, 1998.
A. Bemporad and M. Morari, "Robust model predictive control: A survey," Robustness in Identification and Control, Vol. 245, pp. 207-226, 1999.
J. Birge and F. Louveaux, Introduction to Stochastic Programming. Springer Verlag, 1997.
B. Defourny, D. Ernst, and L. Wehenkel, "Risk-aware decision making and dynamic programming," Selected for oral presentation at the NIPS-08 Workshop on Model Uncertainty and Risk in Reinforcement Learning, Whistler, Canada, 2008.
A. Shapiro, "A dynamic programming approach to adjustable robust optimization," Operations Research Letters, Vol. 39, no. 2, pp. 83-87, 2011.
-, "Minimax and risk averse multistage stochastic programming," School of Industrial & Systems Engineering, Georgia Institute of Technology, Tech. Rep., 2011.
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, "Robust stochastic approximation approach to stochastic programming," SIAM Journal on Optimization, Vol. 19, no. 4, pp. 1574-1609, 2009.
D. Henrion, S. Tarbouriech, and D. Arzelier, "LMI approximations for the radius of the intersection of ellipsoids: Survey," Journal of Optimization Theory and Applications, Vol. 108, no. 1, pp. 1-28, 2001.
S. Boyd, L. El-Ghaoui, E. Feron, V. Balakrishnan, and E. Yaz, "Linear matrix inequalities in system and control theory," Proceedings of the IEEE, Vol. 85, no. 4, pp. 698-699, 1997.
P. Pardalos and S. Vavasis, "Quadratic programming with one negative eigenvalue is NP-hard," Journal of Global Optimization, Vol. 1, no. 1, pp. 15-22, 1991.
R. Freund and J. Orlin, "On the complexity of four polyhedral set containment problems," Mathematical programming, Vol. 33, no. 2, pp. 139-145, 1985.
C. Papadimitriou, "On the complexity of integer programming," Journal of the ACM (JACM), Vol. 28, no. 4, pp. 765-768, 1981.
J. Hiriart-Urruty and C. Lemaréchal, Convex Analysis and Minimization Algorithms: Fundamentals. Springer-Verlag, 1996, vol. 305.
L. Vandenberghe and S. Boyd, "Semidefinite programming," SIAM Review, Vol. 38, no. 1, pp. 49-95, 1996.
A. d'Aspremont and S. Boyd, "Relaxations and randomized methods for nonconvex qcqps," EE392o Class Notes, Stanford University, 2003.
Y. Nesterov, H. Wolkowicz, and Y. Ye, "Semidefinite programming relaxations of nonconvex quadratic optimization," Handbook of semidef-inite programming, pp. 361-419, 2000.