Reinforcement learning versus model predictive control: a comparison on a power system problem

approximate dynamic programming; electric power oscillations damping; fitted Q iteration; interior point method; model predictive control; reinforcement learning; tree-based supervised learning

Abstract :

[en] This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discrete-time optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes open-loop policies by applying an interior-point solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a model-free way closed-loop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batch-mode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available.

Disciplines :

Electrical & electronics engineering
Computer science

Author, co-author :

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Glavic, Mevludin

Capitanescu, Florin ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Reinforcement learning versus model predictive control: a comparison on a power system problem

Publication date :

2009

Journal title :

IEEE Transactions on Systems, Man and Cybernetics. Part B, Cybernetics

ISSN :

1083-4419

eISSN :

1941-0492

Publisher :

IEEE

Volume :

Issue :

Pages :

517-519

Peer reviewed :

Peer Reviewed verified by ORBi

Additional URL :

http://www.montefiore.ulg.ac.be/~ernst/

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 02 June 2009

Statistics

Number of views

367 (25 by ULiège)

Number of downloads

3605 (17 by ULiège)

More statistics

Scopus citations^®

182

Scopus citations^®
without self-citations

166

OpenCitations

115

OpenAlex citations

213

See more details

publications

supporting

mentioning

contrasting

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Bibliography

M. Morari and J. H. Lee, "Model predictive control: Past, present and future," Comput. Chem. Eng., vol. 23, no. 4, pp. 667-682, May 1999.
J. Maciejowski, Predictive Control With Constraints. Englewood Cliffs, NJ: Prentice-Hall, 2001.
D.Mayne and J. Rawlings, "Constrained model predictive control: Stability and optimality," Automatica, vol. 36, no. 6, pp. 789-814, Jun. 2000.
D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
R. Sutton and A. Barto, Reinforcement Learning, An Introduction. Cambridge, MA: MIT Press, 1998.
L. Kaelbling, M. Littman, and A. Moore, "Reinforcement learning: A survey," J. Artif. Intell. Res., vol. 4, pp. 237-285, 1996.
C. Watkins, "Learning from delayed rewards," Ph.D. dissertation, Cambridge Univ., Cambridge, U.K., 1989.
R.Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Mach. Learn., vol. 8, no. 3/4, pp. 229-256, May 1992.
J. Tsitsiklis, "Asynchronous stochastic approximation and Q learning," Mach. Learn., vol. 16, no. 3, pp. 185-202, Sep. 1994.
R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, vol. 12. Cambridge, MA: MIT Press, 2000, pp. 1057-1063.
G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural Comput., vol. 6, no. 2, pp. 215-219, Mar. 1994.
S. Singh and D. Bertsekas, "Reinforcement learning for dynamic channel allocation in cellular telephone systems," in Advances in Neural Information Processing Systems, vol. 9, M. Mozer, M. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, pp. 974-980.
J. Bagnell and J. Schneider, "Autonomous helicopter control using reinforcement learning policy search methods," in Proc. Int. Conf. Robot. Autom., 2001, pp. 1615-1620.
D. Ernst, M. Glavic, and L. Wehenkel, "Power systems stability control: Reinforcement learning framework," IEEE Trans. Power Syst., vol. 19, no. 1, pp. 427-435, Feb. 2004.
S. Qin and T. Badgwell, "An overview of industrial model predictive control technology," in Proc. Chem. Process Control, 1997, vol. 93, pp. 232-256. no. 316.
M. Hassoun, Fundamentals of Artificial Neural Networks. Cambridge, MA: MIT Press, 1995.
B. Schölkopf, C. Burges, and A. Smola, Advances in Kernel Methods: Support Vector Learning. Cambridge, MA: MIT Press, 1999.
C. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, MA: MIT Press, 2000.
L. Breiman, "Random forests," Mach. Learn., vol. 45, no. 1, pp. 5-32, 2001.
P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Mach. Learn., vol. 63, no. 1, pp. 3-42, Apr. 2006.
R. Bellman, Dynamic Programming. Princeton, NJ: Princeton Univ. Press, 1957.
D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, no. 2/3, pp. 161-178, Nov. 2002.
M. Riedmiller, "Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method," in Proc. 16th Eur. Conf. Mach. Learn., 2005, pp. 317-328.
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Apr. 2005.
D. Ernst, P. Geurts, and L. Wehenkel, "Iteratively extending time horizon reinforcement learning," in Proc. 14th Eur. Conf. Mach. Learn., N. Lavra, L. Gamberger, and L. Todorovski, Eds., Sep. 2003, pp. 96-107.
M. Riedmiller, "Neural reinforcement learning to swing-up and balance a real pole," in Proc. Int. Conf. Syst., Man, Cybern., Big Island, HI, 2005, vol. 4, pp. 3191-3196.
D. Ernst, G. Stan, J. Gonçalvez, and L. Wehenkel, "Clinical data based optimal STI strategies for HIV: A reinforcement learning approach," in Proc. BENELEARN, 2006, pp. 65-72.
D. Ernst, R. Marée, and L. Wehenkel, "Reinforcement learning with raw pixels as state input," in Proc. Int. Workshop Intell. Comput. Pattern Anal./Synthesis, Aug. 2006, vol. 4153, pp. 446-454.
D. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed., vol. I, Belmont, MA: Athena Scientific, 2000.
O. Hernandez-Lerma and J. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria. New York: Springer-Verlag, 1996.
D. Bertsekas, "Dynamic programming and suboptimal control: From ADP to MPC," in Proc. 44th IEEE Conf. Decision Control, Eur. Control Conf. 2005, p. 10.
M. Ghandhari, "Control Lyapunov functions: A control strategy for damping of power oscillations in large power systems," Ph.D. dissertation, Roy. Inst. Technol., Stockholm, Sweden, 2000. [Online]. Available: http://www.lib.kth.se/Fulltext/ghandhari001124.pdf.
D. Ernst, M. Glavic, P. Geurts, and L. Wehenkel, "Approximate value iteration in the reinforcement learning context. Application to electrical power system control," Int. J. Emerging Elect. Power Syst., vol. 3, no. 1, p. 37, 2005.
G. Rogers, Power System Oscillations. Norwell, MA: Kluwer, 2000.
P. Kundur, Power System Stability and Control. New York: McGraw-Hill, 1994.
M. A. Pai, Energy Function Analysis for Power System Stability, ser. Power Electronics and Power Systems. Norwell, MA: Kluwer, 1989.
M. Pavella and P. Murthy, Transient Stability of Power Systems: Theory and Practice. Hoboken, NJ: Wiley, 1994.
A. Fiacco and G. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Hoboken, NJ: Wiley, 1968.
S. Mehrotra, "On the implementation of a primal-dual interior point method," SIAM J. Optim., vol. 2, no. 4, pp. 575-601, Nov. 1992.
J. Albuquerque, V. Gopal, G. Staus, L. Biegler, and E. Ydstie, "Interior point SQP strategies for large-scale, structured process optimization problems," Comput. Chem. Eng., vol. 23, no. 4, pp. 543-554, May 1999.
M. Tenny, S. Wright, and J. Rawlings, "Nonlinear model predictive control via feasibility-perturbed sequential quadratic programming," Comput. Optim. Appl., vol. 28, no. 1, pp. 87-121, Apr. 2004.
R. Bartlett, A. Wachter, and L. Biegler, "Active sets vs. interior point strategies for model predictive control," in Proc. Amer. Control Conf., Chicago, IL, Jun. 2000, pp. 4229-4233.
A. Cervantes, A. Wachter, R. Tutuncu, and L. Biegler, "A reduced space interior point strategy for optimization of differential algebraic systems," Comput. Chem. Eng., vol. 24, no. 1, pp. 39-51, Apr. 2000.
S. Bradtke, "Reinforcement learning applied to linear quadratic regulation," in Advances in Neural Information Processing Systems vol. 5. San Mateo, CA: Morgan Kaufmann, 1993, pp. 295-302.
E. Zafiriou, "Robust model predictive control of processes with hard constraints," Comput. Chem. Eng., vol. 14, no. 4/5, pp. 359-371, May 1990.
M. Kothare, V. Balakrishnan, and M. Morari, "Robust constrained model predictive control using linear matrix inequalities," Automatica, vol. 32, no. 10, pp. 1361-1379, Oct. 1996.
A. Bemporad and M. Morari, "Robust model predictive control: A survey," in Robustness in Identification and Control, vol. 245, A. Garruli, A. Tesi, and A. Viccino, Eds. New York: Springer-Verlag, 1999, pp. 207-226.
P. Li, M. Wendt, and G. Wozny, "Robust model predictive control under chance constraints," Comput. Chem. Eng., vol. 24, no. 2, pp. 829-834, Jul. 2000.
P. Li, M. Wendt, and G. Wozny, "A probabilistically constrained model predictive controller," Automatica, vol. 38, no. 7, pp. 1171-1176, Jul. 2002.
W. Romisch, "Stability of stochastic programming problems," in Stochastic Programming. Handbooks in Operations Research and Management Science, vol. 10, A. Ruszczynski and A. Shapiro, Eds. Amsterdam, The Netherlands: Elsevier, 2003, pp. 483-554.
H. Heitsch, W. Romisch, and C. Strugarek, "Stability of multistage stochastic programs," SIAM J. Optim., vol. 17, no. 2, pp. 511-525, Aug. 2006.
R. Negenborn, B. De Schutter, M. Wiering, and J. Hellendoorn, "Learning-based model predictive control for Markov decision processes," in Proc. 16th IFAC World Congr., Jul. 2005, p. 6.
J. M. Lee and J. H. Lee, "Simulation-based learning of cost-to-go for control of nonlinear processes," Korean J. Chem. Eng., vol. 21, no. 2, pp. 338-344, Mar. 2004.
G. Gordon, "Stable function approximation in dynamic programming," in Proc. 12th Int. Conf. Mach. Learn., 1995, pp. 261-268.
M. Lagoudakis and R. Parr, "Reinforcement learning as classification: Leveraging modern classifiers," in Proc. ICML, 2003, pp. 424-431.
A. Fern, S. Yoon, and R. Givan, "Approximate policy iteration with a policy language bias," in Advances in Neural Information Processing Systems, vol. 16, S. Thrun, L. Saul, and B. Schölkopf, Eds. Cambridge, MA: MIT Press, 2004, p. 8.

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website