Reference : Gradient Energy Matching for Distributed Asynchronous Gradient Descent
E-prints/Working papers : Already available on another site
Engineering, computing & technology : Computer science
http://hdl.handle.net/2268/226232
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
English
Hermans, Joeri mailto [Université de Liège - ULiège > > > Doct. sc. (info.)]
Louppe, Gilles mailto [Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Big Data >]
22-May-2018
No
[en] Computer Science - Learning ; Computer Science - Distributed ; Parallel ; and Cluster Computing ; Statistics - Machine Learning
[en] Distributed asynchronous SGD has become widely used for deep learning in large-scale systems, but remains notorious for its instability when increasing the number of workers. In this work, we study the dynamics of distributed asynchronous SGD under the lens of Lagrangian mechanics. Using this description, we introduce the concept of energy to describe the optimization process and derive a sufficient condition ensuring its stability as long as the collective energy induced by the active workers remains below the energy of a target synchronous process. Making use of this criterion, we derive a stable distributed asynchronous optimization procedure, GEM, that estimates and maintains the energy of the asynchronous system below or equal to the energy of sequential SGD with momentum. Experimental results highlight the stability and speedup of GEM compared to existing schemes, even when scaling to one hundred asynchronous workers. Results also indicate better generalization compared to the targeted SGD with momentum.
Researchers
http://hdl.handle.net/2268/226232
https://arxiv.org/abs/1805.08469
https://arxiv.org/abs/1805.08469

File(s) associated to this reference

Fulltext file(s):

FileCommentaryVersionSizeAccess
Open access
1805.08469.pdfAuthor preprint639.61 kBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.