Risk-Sensitive Policy with Distributional Reinforcement Learning

Théate, Thibaut; Ernst, Damien

doi:10.3390/a16070325

Download

Article (Scientific journals)

Risk-Sensitive Policy with Distributional Reinforcement Learning

Théate, Thibaut; Ernst, Damien

2023 • In Algorithms, 16 (325), p. 16

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/297883

DOI
10.3390/a16070325

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

algorithms-16-00325.pdf

Author postprint (2.38 MB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Distributional reinforcement learning; Sequential decision-making; Risk-sensitive policy; risk management; deep neural networks

Abstract :

[en] Classical reinforcement learning (RL) techniques are generally concerned with the design of decision-making policies driven by the maximisation of the expected outcome. Nevertheless, this approach does not take into consideration the potential risk associated with the actions taken, which may be critical in certain applications. To address that issue, the present research work introduces a novel methodology based on distributional RL to derive sequential decision-making policies that are sensitive to the risk, the latter being modelled by the tail of the return probability distribution. The core idea is to replace the Q function generally standing at the core of learning schemes in RL by another function, taking into account both the expected return and the risk. Named the risk-based utility function U, it can be extracted from the random return distribution Z naturally learnt by any distributional RL algorithm. This enables the spanning of the complete potential trade-off between risk minimisation and expected return maximisation, in contrast to fully risk-averse methodologies. Fundamentally, this research yields a truly practical and accessible solution for learning risk-sensitive policies with minimal modification to the distributional RL algorithm, with an emphasis on the interpretability of the resulting decision-making process.

Disciplines :

Computer science

Author, co-author :

Théate, Thibaut ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Ernst, Damien ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Language :

English

Title :

Risk-Sensitive Policy with Distributional Reinforcement Learning

Publication date :

30 June 2023

Journal title :

Algorithms

ISSN :

1999-4893

Publisher :

MDPI Open Access Publishing, Switzerland

Volume :

Issue :

325

Pages :

Peer reviewed :

Peer Reviewed verified by ORBi

Funders :

F.R.S.-FNRS - Fonds de la Recherche Scientifique

Available on ORBi :

since 30 December 2022

Statistics

Number of views

189 (18 by ULiège)

Number of downloads

98 (7 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Sutton R.S. Barto A.G. Reinforcement Learning: An Introduction MIT Press Cambridge, MA, USA 2018
Watkins C.J.C.H. Dayan P. Technical Note: Q-Learning Mach. Learn. 1992 8 279 292 10.1007/BF00992698
Dulac-Arnold G. Levine N. Mankowitz D.J. Li J. Paduraru C. Gowal S. Hester T. Challenges of real-world reinforcement learning: Definitions, benchmarks and analysis Mach. Learn. 2021 110 2419 2468 10.1007/s10994-021-05961-4
Gottesman O. Johansson F.D. Komorowski M. Faisal A.A. Sontag D. Doshi-Velez F. Celi L.A. Guidelines for reinforcement learning in healthcare Nat. Med. 2019 25 16 18 10.1038/s41591-018-0310-5 30617332
Théate T. Ernst D. An application of deep reinforcement learning to algorithmic trading Expert Syst. Appl. 2021 173 114632 10.1016/j.eswa.2021.114632
Thananjeyan B. Balakrishna A. Nair S. Luo M. Srinivasan K. Hwang M. Gonzalez J.E. Ibarz J. Finn C. Goldberg K. Recovery RL: Safe Reinforcement Learning with Learned Recovery Zones IEEE Robot. Autom. Lett. 2021 6 4915 4922 10.1109/LRA.2021.3070252
Zhu Z. Zhao H. A Survey of Deep RL and IL for Autonomous Driving Policy Learning IEEE Trans. Intell. Transp. Syst. 2022 23 14043 14065 10.1109/TITS.2021.3134702
Bellemare M.G. Dabney W. Munos R. A Distributional Perspective on Reinforcement Learning Proceedings of the 34th International Conference on Machine Learning, ICML 2017 Sydney, NSW, Australia 6–11 August 2017 Volume 70 449 458
García J. Fernández F. A comprehensive survey on safe reinforcement learning J. Mach. Learn. Res. 2015 16 1437 1480
Castro D.D. Tamar A. Mannor S. Policy Gradients with Variance Related Risk Criteria Proceedings of the 29th International Conference on Machine Learning, ICML 2012 Edinburgh, UK 26 June–1 July 2012
La P. Ghavamzadeh M. Actor-Critic Algorithms for Risk-Sensitive MDPs Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013 Lake Tahoe, NV, USA 5–8 December 2013 252 260
Zhang S. Liu B. Whiteson S. Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021 Virtual Event 2–9 February 2021 AAAI Press Washington, DC, USA 2021 10905 10913
Rockafellar R.T. Uryasev S. Conditional Value-at-Risk for General Loss Distributions Corp. Financ. Organ. J. 2001 7 1443 1471 10.2139/ssrn.267256
Chow Y. Tamar A. Mannor S. Pavone M. Risk-Sensitive and Robust Decision-Making: A CVaR Optimization Approach Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015 Montreal, QC, Canada 7–12 December 2015 1522 1530
Chow Y. Ghavamzadeh M. Janson L. Pavone M. Risk-Constrained Reinforcement Learning with Percentile Risk Criteria J. Mach. Learn. Res. 2017 18 167:1 167:51
Tamar A. Glassner Y. Mannor S. Optimizing the CVaR via Sampling Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Austin, TX, USA 25–30 January 2015 AAAI Press Washington, DC, USA 2015 2993 2999
Rajeswaran A. Ghotra S. Ravindran B. Levine S. EPOpt: Learning Robust Neural Network Policies Using Model Ensembles Proceedings of the 5th International Conference on Learning Representations, ICLR 2017 Toulon, France 24–26 April 2017
Hiraoka T. Imagawa T. Mori T. Onishi T. Tsuruoka Y. Learning Robust Options by Conditional Value at Risk Optimization Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019 Vancouver, BC, Canada 8–14 December 2019 2615 2625
Shen Y. Tobia M.J. Sommer T. Obermayer K. Risk-Sensitive Reinforcement Learning Neural Comput. 2014 26 1298 1328 10.1162/NECO_a_00600 24708369
Dabney W. Ostrovski G. Silver D. Munos R. Implicit Quantile Networks for Distributional Reinforcement Learning Proceedings of the 35th International Conference on Machine Learning, ICML 2018 Stockholmsmässan, Stockholm, Sweden 10–15 July 2018 Volume 80 1104 1113
Tang Y.C. Zhang J. Salakhutdinov R. Worst Cases Policy Gradients Proceedings of the 3rd Annual Conference on Robot Learning, CoRL 2019 Osaka, Japan 30 October–1 November 2019 Volume 100 1078 1093
Urpí N.A. Curi S. Krause A. Risk-Averse Offline Reinforcement Learning Proceedings of the 9th International Conference on Learning Representations, ICLR 2021 Virtual Event, Austria 3–7 May 2021
Yang Q. Simão T.D. Tindemans S. Spaan M.T.J. Safety-constrained reinforcement learning with a distributional safety critic Mach. Learn. 2022 112 859 887 10.1007/s10994-022-06187-8
Pinto L. Davidson J. Sukthankar R. Gupta A. Robust Adversarial Reinforcement Learning Proceedings of the 34th International Conference on Machine Learning, ICML 2017 Sydney, NSW, Australia 6–11 August 2017 Volume 70 2817 2826
Qiu W. Wang X. Yu R. Wang R. He X. An B. Obraztsova S. Rabinovich Z. RMIX: Learning Risk-Sensitive Policies for Cooperative Reinforcement Learning Agents Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021 Virtual 6–14 December 2021 23049 23062
Bellman R. Dynamic Programming Princeton University Press Princeton, NJ, USA 1957
Théate T. Wehenkel A. Bolland A. Louppe G. Ernst D. Distributional Reinforcement Learning with Unconstrained Monotonic Neural Networks Neurocomputing 2023 534 199 219 10.1016/j.neucom.2023.02.049
Wehenkel A. Louppe G. Unconstrained Monotonic Neural Networks Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019 Vancouver, BC, Canada 8–14 December 2019 1543 1553