Article (Scientific journals)
Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms
Sabatelli, Matthia; Louppe, Gilles; Geurts, Pierre et al.
2019In Advances in Neural Information Processing Systems
Peer Reviewed verified by ORBi
 

Files


Full Text
Approximating_two_value_functions_instead_of_one__towards_characterizing_a_new_family_of_Deep_Reinforcement_Learning_algorithms__camera_ready.pdf
Publisher postprint (998.3 kB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Deep Reinforcement Learning; Function Approximators; Model-free Deep Reinforcement Learning
Abstract :
[en] This paper makes one step forward towards characterizing a new family of model-free Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function (V), alongside an approximation of the state-action value function (Q). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN). Intending to investigate why DQV's learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV's performance can get harmed and introduce a novel off-policy DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the V and Q functions that are learned by DQV and DQV-Max and show that both algorithms might perform so well on several DRL test-beds because they are less prone to suffer from the overestimation bias of the Q function.
Disciplines :
Computer science
Author, co-author :
Sabatelli, Matthia ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Louppe, Gilles  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Big Data
Geurts, Pierre  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Wiering, Marco;  University of Groningen > Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence
Language :
English
Title :
Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms
Publication date :
December 2019
Journal title :
Advances in Neural Information Processing Systems
ISSN :
1049-5258
Publisher :
Morgan Kaufmann Publishers, San Mateo, United States - California
Peer reviewed :
Peer Reviewed verified by ORBi
Available on ORBi :
since 15 October 2019

Statistics


Number of views
103 (10 by ULiège)
Number of downloads
80 (7 by ULiège)

Bibliography


Similar publications



Sorry the service is unavailable at the moment. Please try again later.
Contact ORBi