approximate dynamic programming; electric power oscillations damping; fitted Q iteration; interior point method; model predictive control; reinforcement learning; tree-based supervised learning
Abstract :
[en] This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discrete-time optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes open-loop policies by applying an interior-point solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a model-free way closed-loop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batch-mode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available.
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.
Bibliography
M. Morari and J. H. Lee, "Model predictive control: Past, present and future," Comput. Chem. Eng., vol. 23, no. 4, pp. 667-682, May 1999.
J. Maciejowski, Predictive Control With Constraints. Englewood Cliffs, NJ: Prentice-Hall, 2001.
D.Mayne and J. Rawlings, "Constrained model predictive control: Stability and optimality," Automatica, vol. 36, no. 6, pp. 789-814, Jun. 2000.
D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
R. Sutton and A. Barto, Reinforcement Learning, An Introduction. Cambridge, MA: MIT Press, 1998.
L. Kaelbling, M. Littman, and A. Moore, "Reinforcement learning: A survey," J. Artif. Intell. Res., vol. 4, pp. 237-285, 1996.
C. Watkins, "Learning from delayed rewards," Ph.D. dissertation, Cambridge Univ., Cambridge, U.K., 1989.
R.Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Mach. Learn., vol. 8, no. 3/4, pp. 229-256, May 1992.
J. Tsitsiklis, "Asynchronous stochastic approximation and Q learning," Mach. Learn., vol. 16, no. 3, pp. 185-202, Sep. 1994.
R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, vol. 12. Cambridge, MA: MIT Press, 2000, pp. 1057-1063.
G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural Comput., vol. 6, no. 2, pp. 215-219, Mar. 1994.
S. Singh and D. Bertsekas, "Reinforcement learning for dynamic channel allocation in cellular telephone systems," in Advances in Neural Information Processing Systems, vol. 9, M. Mozer, M. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, pp. 974-980.
J. Bagnell and J. Schneider, "Autonomous helicopter control using reinforcement learning policy search methods," in Proc. Int. Conf. Robot. Autom., 2001, pp. 1615-1620.
D. Ernst, M. Glavic, and L. Wehenkel, "Power systems stability control: Reinforcement learning framework," IEEE Trans. Power Syst., vol. 19, no. 1, pp. 427-435, Feb. 2004.
S. Qin and T. Badgwell, "An overview of industrial model predictive control technology," in Proc. Chem. Process Control, 1997, vol. 93, pp. 232-256. no. 316.
M. Hassoun, Fundamentals of Artificial Neural Networks. Cambridge, MA: MIT Press, 1995.
B. Schölkopf, C. Burges, and A. Smola, Advances in Kernel Methods: Support Vector Learning. Cambridge, MA: MIT Press, 1999.
C. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, MA: MIT Press, 2000.
L. Breiman, "Random forests," Mach. Learn., vol. 45, no. 1, pp. 5-32, 2001.
P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Mach. Learn., vol. 63, no. 1, pp. 3-42, Apr. 2006.
R. Bellman, Dynamic Programming. Princeton, NJ: Princeton Univ. Press, 1957.
D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, no. 2/3, pp. 161-178, Nov. 2002.
M. Riedmiller, "Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method," in Proc. 16th Eur. Conf. Mach. Learn., 2005, pp. 317-328.
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Apr. 2005.
D. Ernst, P. Geurts, and L. Wehenkel, "Iteratively extending time horizon reinforcement learning," in Proc. 14th Eur. Conf. Mach. Learn., N. Lavra, L. Gamberger, and L. Todorovski, Eds., Sep. 2003, pp. 96-107.
M. Riedmiller, "Neural reinforcement learning to swing-up and balance a real pole," in Proc. Int. Conf. Syst., Man, Cybern., Big Island, HI, 2005, vol. 4, pp. 3191-3196.
D. Ernst, G. Stan, J. Gonçalvez, and L. Wehenkel, "Clinical data based optimal STI strategies for HIV: A reinforcement learning approach," in Proc. BENELEARN, 2006, pp. 65-72.
D. Ernst, R. Marée, and L. Wehenkel, "Reinforcement learning with raw pixels as state input," in Proc. Int. Workshop Intell. Comput. Pattern Anal./Synthesis, Aug. 2006, vol. 4153, pp. 446-454.
D. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed., vol. I, Belmont, MA: Athena Scientific, 2000.
O. Hernandez-Lerma and J. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria. New York: Springer-Verlag, 1996.
D. Bertsekas, "Dynamic programming and suboptimal control: From ADP to MPC," in Proc. 44th IEEE Conf. Decision Control, Eur. Control Conf. 2005, p. 10.
M. Ghandhari, "Control Lyapunov functions: A control strategy for damping of power oscillations in large power systems," Ph.D. dissertation, Roy. Inst. Technol., Stockholm, Sweden, 2000. [Online]. Available: http://www.lib.kth.se/Fulltext/ghandhari001124.pdf.
D. Ernst, M. Glavic, P. Geurts, and L. Wehenkel, "Approximate value iteration in the reinforcement learning context. Application to electrical power system control," Int. J. Emerging Elect. Power Syst., vol. 3, no. 1, p. 37, 2005.
G. Rogers, Power System Oscillations. Norwell, MA: Kluwer, 2000.
P. Kundur, Power System Stability and Control. New York: McGraw-Hill, 1994.
M. A. Pai, Energy Function Analysis for Power System Stability, ser. Power Electronics and Power Systems. Norwell, MA: Kluwer, 1989.
M. Pavella and P. Murthy, Transient Stability of Power Systems: Theory and Practice. Hoboken, NJ: Wiley, 1994.
A. Fiacco and G. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Hoboken, NJ: Wiley, 1968.
S. Mehrotra, "On the implementation of a primal-dual interior point method," SIAM J. Optim., vol. 2, no. 4, pp. 575-601, Nov. 1992.
J. Albuquerque, V. Gopal, G. Staus, L. Biegler, and E. Ydstie, "Interior point SQP strategies for large-scale, structured process optimization problems," Comput. Chem. Eng., vol. 23, no. 4, pp. 543-554, May 1999.
M. Tenny, S. Wright, and J. Rawlings, "Nonlinear model predictive control via feasibility-perturbed sequential quadratic programming," Comput. Optim. Appl., vol. 28, no. 1, pp. 87-121, Apr. 2004.
R. Bartlett, A. Wachter, and L. Biegler, "Active sets vs. interior point strategies for model predictive control," in Proc. Amer. Control Conf., Chicago, IL, Jun. 2000, pp. 4229-4233.
A. Cervantes, A. Wachter, R. Tutuncu, and L. Biegler, "A reduced space interior point strategy for optimization of differential algebraic systems," Comput. Chem. Eng., vol. 24, no. 1, pp. 39-51, Apr. 2000.
S. Bradtke, "Reinforcement learning applied to linear quadratic regulation," in Advances in Neural Information Processing Systems vol. 5. San Mateo, CA: Morgan Kaufmann, 1993, pp. 295-302.
E. Zafiriou, "Robust model predictive control of processes with hard constraints," Comput. Chem. Eng., vol. 14, no. 4/5, pp. 359-371, May 1990.
M. Kothare, V. Balakrishnan, and M. Morari, "Robust constrained model predictive control using linear matrix inequalities," Automatica, vol. 32, no. 10, pp. 1361-1379, Oct. 1996.
A. Bemporad and M. Morari, "Robust model predictive control: A survey," in Robustness in Identification and Control, vol. 245, A. Garruli, A. Tesi, and A. Viccino, Eds. New York: Springer-Verlag, 1999, pp. 207-226.
P. Li, M. Wendt, and G. Wozny, "Robust model predictive control under chance constraints," Comput. Chem. Eng., vol. 24, no. 2, pp. 829-834, Jul. 2000.
P. Li, M. Wendt, and G. Wozny, "A probabilistically constrained model predictive controller," Automatica, vol. 38, no. 7, pp. 1171-1176, Jul. 2002.
W. Romisch, "Stability of stochastic programming problems," in Stochastic Programming. Handbooks in Operations Research and Management Science, vol. 10, A. Ruszczynski and A. Shapiro, Eds. Amsterdam, The Netherlands: Elsevier, 2003, pp. 483-554.
H. Heitsch, W. Romisch, and C. Strugarek, "Stability of multistage stochastic programs," SIAM J. Optim., vol. 17, no. 2, pp. 511-525, Aug. 2006.
R. Negenborn, B. De Schutter, M. Wiering, and J. Hellendoorn, "Learning-based model predictive control for Markov decision processes," in Proc. 16th IFAC World Congr., Jul. 2005, p. 6.
J. M. Lee and J. H. Lee, "Simulation-based learning of cost-to-go for control of nonlinear processes," Korean J. Chem. Eng., vol. 21, no. 2, pp. 338-344, Mar. 2004.
G. Gordon, "Stable function approximation in dynamic programming," in Proc. 12th Int. Conf. Mach. Learn., 1995, pp. 261-268.
M. Lagoudakis and R. Parr, "Reinforcement learning as classification: Leveraging modern classifiers," in Proc. ICML, 2003, pp. 424-431.
A. Fern, S. Yoon, and R. Givan, "Approximate policy iteration with a policy language bias," in Advances in Neural Information Processing Systems, vol. 16, S. Thrun, L. Saul, and B. Schölkopf, Eds. Cambridge, MA: MIT Press, 2004, p. 8.
Similar publications
Sorry the service is unavailable at the moment. Please try again later.
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.