Article (Scientific journals)
Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms
Sabatelli, Matthia; Louppe, Gilles; Geurts, Pierre et al.
2019In Advances in Neural Information Processing Systems
Peer Reviewed verified by ORBi
 

Files


Full Text
Approximating_two_value_functions_instead_of_one__towards_characterizing_a_new_family_of_Deep_Reinforcement_Learning_algorithms__camera_ready.pdf
Publisher postprint (998.3 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Deep Reinforcement Learning; Function Approximators; Model-free Deep Reinforcement Learning
Abstract :
[en] This paper makes one step forward towards characterizing a new family of model-free Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function (V), alongside an approximation of the state-action value function (Q). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN). Intending to investigate why DQV's learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV's performance can get harmed and introduce a novel off-policy DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the V and Q functions that are learned by DQV and DQV-Max and show that both algorithms might perform so well on several DRL test-beds because they are less prone to suffer from the overestimation bias of the Q function.
Disciplines :
Computer science
Author, co-author :
Sabatelli, Matthia ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Louppe, Gilles  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Big Data
Geurts, Pierre ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Wiering, Marco;  University of Groningen > Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence
Language :
English
Title :
Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms
Publication date :
December 2019
Journal title :
Advances in Neural Information Processing Systems
ISSN :
1049-5258
Publisher :
Morgan Kaufmann Publishers, San Mateo, United States - California
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBi :
since 15 October 2019

Statistics


Number of views
85 (10 by ULiège)
Number of downloads
64 (6 by ULiège)

Bibliography


Similar publications



Contact ORBi