model-free deep reinforcement learning; temporal-difference learning; DQV, DQV-Max-Learning
Abstract :
[en] We present a novel approach for learning an ap-proximation of the optimal state-action value function (Q) in model-free Deep Reinforcement Learning (DRL). We propose to learn this approximation while simultaneously learning an approximation of the state-value function (V ). We introduce two new DRL algorithms, called DQV-Learning and DQV-Max
Learning, which follow this specific learning dynamic. In short, both algorithms use two neural networks for separately learning the V function and the Q function. We validate the effectiveness
of this training scheme by thoroughly comparing our algorithms to DRL methods which only learn an approximation of the Q function, namely DQN and DDQN. Our results show that DQV
and DQV-Max present several important benefits: they converge significantly faster, can achieve super-human performance on DRL testbeds on which DQN and DDQN failed to do so, and
suffer less from the overestimation bias of the Q function.
Disciplines :
Computer science
Author, co-author :
Sabatelli, Matthia ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Louppe, Gilles ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Big Data
Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Wiering, Marco
Language :
English
Title :
The Deep Quality-Value Family of Deep Reinforcement Learning Algorithms
Publication date :
July 2020
Journal title :
International Joint Conference on Neural Networks (IJCNN 2020)