approximate dynamic programming; electric power oscillations damping; fitted Q iteration; interior point method; model predictive control; reinforcement learning; tree-based supervised learning
Abstract :
[en] This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discrete-time optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes open-loop policies by applying an interior-point solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a model-free way closed-loop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batch-mode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available.
M. Morari and J. H. Lee, "Model predictive control: Past, present and future," Comput. Chem. Eng., vol. 23, no. 4, pp. 667-682, May 1999.
J. Maciejowski, Predictive Control With Constraints. Englewood Cliffs, NJ: Prentice-Hall, 2001.
D.Mayne and J. Rawlings, "Constrained model predictive control: Stability and optimality," Automatica, vol. 36, no. 6, pp. 789-814, Jun. 2000.
D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
R. Sutton and A. Barto, Reinforcement Learning, An Introduction. Cambridge, MA: MIT Press, 1998.
L. Kaelbling, M. Littman, and A. Moore, "Reinforcement learning: A survey," J. Artif. Intell. Res., vol. 4, pp. 237-285, 1996.
C. Watkins, "Learning from delayed rewards," Ph.D. dissertation, Cambridge Univ., Cambridge, U.K., 1989.
R.Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Mach. Learn., vol. 8, no. 3/4, pp. 229-256, May 1992.
J. Tsitsiklis, "Asynchronous stochastic approximation and Q learning," Mach. Learn., vol. 16, no. 3, pp. 185-202, Sep. 1994.
R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, vol. 12. Cambridge, MA: MIT Press, 2000, pp. 1057-1063.
G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural Comput., vol. 6, no. 2, pp. 215-219, Mar. 1994.
S. Singh and D. Bertsekas, "Reinforcement learning for dynamic channel allocation in cellular telephone systems," in Advances in Neural Information Processing Systems, vol. 9, M. Mozer, M. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, pp. 974-980.
J. Bagnell and J. Schneider, "Autonomous helicopter control using reinforcement learning policy search methods," in Proc. Int. Conf. Robot. Autom., 2001, pp. 1615-1620.
D. Ernst, M. Glavic, and L. Wehenkel, "Power systems stability control: Reinforcement learning framework," IEEE Trans. Power Syst., vol. 19, no. 1, pp. 427-435, Feb. 2004.
S. Qin and T. Badgwell, "An overview of industrial model predictive control technology," in Proc. Chem. Process Control, 1997, vol. 93, pp. 232-256. no. 316.
M. Hassoun, Fundamentals of Artificial Neural Networks. Cambridge, MA: MIT Press, 1995.
B. Schölkopf, C. Burges, and A. Smola, Advances in Kernel Methods: Support Vector Learning. Cambridge, MA: MIT Press, 1999.
C. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines. Cambridge, MA: MIT Press, 2000.
L. Breiman, "Random forests," Mach. Learn., vol. 45, no. 1, pp. 5-32, 2001.
P. Geurts, D. Ernst, and L. Wehenkel, "Extremely randomized trees," Mach. Learn., vol. 63, no. 1, pp. 3-42, Apr. 2006.
R. Bellman, Dynamic Programming. Princeton, NJ: Princeton Univ. Press, 1957.
D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, no. 2/3, pp. 161-178, Nov. 2002.
M. Riedmiller, "Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method," in Proc. 16th Eur. Conf. Mach. Learn., 2005, pp. 317-328.
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Apr. 2005.
D. Ernst, P. Geurts, and L. Wehenkel, "Iteratively extending time horizon reinforcement learning," in Proc. 14th Eur. Conf. Mach. Learn., N. Lavra, L. Gamberger, and L. Todorovski, Eds., Sep. 2003, pp. 96-107.
M. Riedmiller, "Neural reinforcement learning to swing-up and balance a real pole," in Proc. Int. Conf. Syst., Man, Cybern., Big Island, HI, 2005, vol. 4, pp. 3191-3196.
D. Ernst, G. Stan, J. Gonçalvez, and L. Wehenkel, "Clinical data based optimal STI strategies for HIV: A reinforcement learning approach," in Proc. BENELEARN, 2006, pp. 65-72.
D. Ernst, R. Marée, and L. Wehenkel, "Reinforcement learning with raw pixels as state input," in Proc. Int. Workshop Intell. Comput. Pattern Anal./Synthesis, Aug. 2006, vol. 4153, pp. 446-454.
D. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed., vol. I, Belmont, MA: Athena Scientific, 2000.
O. Hernandez-Lerma and J. Lasserre, Discrete-Time Markov Control Processes. Basic Optimality Criteria. New York: Springer-Verlag, 1996.
D. Bertsekas, "Dynamic programming and suboptimal control: From ADP to MPC," in Proc. 44th IEEE Conf. Decision Control, Eur. Control Conf. 2005, p. 10.
M. Ghandhari, "Control Lyapunov functions: A control strategy for damping of power oscillations in large power systems," Ph.D. dissertation, Roy. Inst. Technol., Stockholm, Sweden, 2000. [Online]. Available: http://www.lib.kth.se/Fulltext/ghandhari001124.pdf.
D. Ernst, M. Glavic, P. Geurts, and L. Wehenkel, "Approximate value iteration in the reinforcement learning context. Application to electrical power system control," Int. J. Emerging Elect. Power Syst., vol. 3, no. 1, p. 37, 2005.
G. Rogers, Power System Oscillations. Norwell, MA: Kluwer, 2000.
P. Kundur, Power System Stability and Control. New York: McGraw-Hill, 1994.
M. A. Pai, Energy Function Analysis for Power System Stability, ser. Power Electronics and Power Systems. Norwell, MA: Kluwer, 1989.
M. Pavella and P. Murthy, Transient Stability of Power Systems: Theory and Practice. Hoboken, NJ: Wiley, 1994.
A. Fiacco and G. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques. Hoboken, NJ: Wiley, 1968.
S. Mehrotra, "On the implementation of a primal-dual interior point method," SIAM J. Optim., vol. 2, no. 4, pp. 575-601, Nov. 1992.
J. Albuquerque, V. Gopal, G. Staus, L. Biegler, and E. Ydstie, "Interior point SQP strategies for large-scale, structured process optimization problems," Comput. Chem. Eng., vol. 23, no. 4, pp. 543-554, May 1999.
M. Tenny, S. Wright, and J. Rawlings, "Nonlinear model predictive control via feasibility-perturbed sequential quadratic programming," Comput. Optim. Appl., vol. 28, no. 1, pp. 87-121, Apr. 2004.
R. Bartlett, A. Wachter, and L. Biegler, "Active sets vs. interior point strategies for model predictive control," in Proc. Amer. Control Conf., Chicago, IL, Jun. 2000, pp. 4229-4233.
A. Cervantes, A. Wachter, R. Tutuncu, and L. Biegler, "A reduced space interior point strategy for optimization of differential algebraic systems," Comput. Chem. Eng., vol. 24, no. 1, pp. 39-51, Apr. 2000.
S. Bradtke, "Reinforcement learning applied to linear quadratic regulation," in Advances in Neural Information Processing Systems vol. 5. San Mateo, CA: Morgan Kaufmann, 1993, pp. 295-302.
E. Zafiriou, "Robust model predictive control of processes with hard constraints," Comput. Chem. Eng., vol. 14, no. 4/5, pp. 359-371, May 1990.
M. Kothare, V. Balakrishnan, and M. Morari, "Robust constrained model predictive control using linear matrix inequalities," Automatica, vol. 32, no. 10, pp. 1361-1379, Oct. 1996.
A. Bemporad and M. Morari, "Robust model predictive control: A survey," in Robustness in Identification and Control, vol. 245, A. Garruli, A. Tesi, and A. Viccino, Eds. New York: Springer-Verlag, 1999, pp. 207-226.
P. Li, M. Wendt, and G. Wozny, "Robust model predictive control under chance constraints," Comput. Chem. Eng., vol. 24, no. 2, pp. 829-834, Jul. 2000.
P. Li, M. Wendt, and G. Wozny, "A probabilistically constrained model predictive controller," Automatica, vol. 38, no. 7, pp. 1171-1176, Jul. 2002.
W. Romisch, "Stability of stochastic programming problems," in Stochastic Programming. Handbooks in Operations Research and Management Science, vol. 10, A. Ruszczynski and A. Shapiro, Eds. Amsterdam, The Netherlands: Elsevier, 2003, pp. 483-554.
H. Heitsch, W. Romisch, and C. Strugarek, "Stability of multistage stochastic programs," SIAM J. Optim., vol. 17, no. 2, pp. 511-525, Aug. 2006.
R. Negenborn, B. De Schutter, M. Wiering, and J. Hellendoorn, "Learning-based model predictive control for Markov decision processes," in Proc. 16th IFAC World Congr., Jul. 2005, p. 6.
J. M. Lee and J. H. Lee, "Simulation-based learning of cost-to-go for control of nonlinear processes," Korean J. Chem. Eng., vol. 21, no. 2, pp. 338-344, Mar. 2004.
G. Gordon, "Stable function approximation in dynamic programming," in Proc. 12th Int. Conf. Mach. Learn., 1995, pp. 261-268.
M. Lagoudakis and R. Parr, "Reinforcement learning as classification: Leveraging modern classifiers," in Proc. ICML, 2003, pp. 424-431.
A. Fern, S. Yoon, and R. Givan, "Approximate policy iteration with a policy language bias," in Advances in Neural Information Processing Systems, vol. 16, S. Thrun, L. Saul, and B. Schölkopf, Eds. Cambridge, MA: MIT Press, 2004, p. 8.