Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

Sabatelli, Matthia; Louppe, Gilles; Geurts, Pierre; Wiering, Marco

Article (Scientific journals)

Sabatelli, Matthia; Louppe, Gilles; Geurts, Pierre et al.

2019 • In Advances in Neural Information Processing Systems

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/240287

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Approximating_two_value_functions_instead_of_one__towards_characterizing_a_new_family_of_Deep_Reinforcement_Learning_algorithms__camera_ready.pdf

Publisher postprint (998.3 kB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Deep Reinforcement Learning; Function Approximators; Model-free Deep Reinforcement Learning

Abstract :

[en] This paper makes one step forward towards characterizing a new family of model-free Deep Reinforcement Learning (DRL) algorithms. The aim of these algorithms is to jointly learn an approximation of the state-value function (V), alongside an approximation of the state-action value function (Q). Our analysis starts with a thorough study of the Deep Quality-Value Learning (DQV) algorithm, a DRL algorithm which has been shown to outperform popular techniques such as Deep-Q-Learning (DQN) and Double-Deep-Q-Learning (DDQN). Intending to investigate why DQV's learning dynamics allow this algorithm to perform so well, we formulate a set of research questions which help us characterize a new family of DRL algorithms. Among our results, we present some specific cases in which DQV's performance can get harmed and introduce a novel off-policy DRL algorithm, called DQV-Max, which can outperform DQV. We then study the behavior of the V and Q functions that are learned by DQV and DQV-Max and show that both algorithms might perform so well on several DRL test-beds because they are less prone to suffer from the overestimation bias of the Q function.

Disciplines :

Computer science

Author, co-author :

Sabatelli, Matthia ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique

Louppe, Gilles ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Big Data

Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique

Wiering, Marco; University of Groningen > Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence

Language :

English

Title :

Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms

Publication date :

December 2019

Journal title :

Advances in Neural Information Processing Systems

ISSN :

1049-5258

Publisher :

Morgan Kaufmann Publishers, San Mateo, United States - California

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBi :

since 15 October 2019

Statistics

Number of views

103 (10 by ULiège)

Number of downloads

80 (7 by ULiège)

More statistics

See more details

Bibliography

Similar publications

Sorry the service is unavailable at the moment. Please try again later.

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website