Reference : A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding
E-prints/Working papers : First made available on ORBi
Engineering, computing & technology : Energy
http://hdl.handle.net/2268/232846
A Deep Reinforcement Learning Framework for Continuous Intraday Market Bidding
English
Boukas, Ioannis mailto [Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart-Microgrids >]
Ernst, Damien mailto [Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids >]
Théate, Thibaut mailto [Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids >]
Bolland, Adrien mailto [Université de Liège - ULiège > > > Master ingé. civ. électr., à fin.]
Huynen, Alexandre []
Buchwald, Martin []
Wynants, Christelle []
Cornélusse, Bertrand mailto [Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart-Microgrids >]
Apr-2020
Yes
[en] reinforcement learning ; electricity markets ; intra-day markets ; energy transition ; artificial intelligence
[en] The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In this context, the short-term electricity markets and in particular the intraday market are considered a suitable trading floor for these exchanges to occur. A key component for the successful renewable energy sources integration is the usage of energy storage. In this paper, we propose a novel modelling framework for the strategic participation of energy storage in the European continuous intraday market where exchanges occur through a centralized order book. The goal of the storage device operator is the maximization of the profits received over the entire trading horizon, while taking into account the operational constraints of the unit. The sequential decision-making problem of trading in the intraday market is modelled as a Markov Decision Process. An asynchronous version of the fitted Q iteration algorithm is chosen for solving this problem due to its sample efficiency. The large and variable number of the existing orders in the order book motivates the use of high-level actions and an alternative state representation. Historical data are used for the generation of a large number of artificial trajectories in order to address exploration issues during the learning process. The resulting policy is back-tested and compared against a benchmark strategy that is the current industrial standard. Results indicate that the agent converges to a policy that achieves in average higher total revenues than the benchmark strategy.
Researchers ; Professionals ; Students
http://hdl.handle.net/2268/232846
http://arxiv.org/abs/2004.05940

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
2004.05940.pdfPublisher postprint929.25 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.