Paper published in a book (Scientific congresses and symposiums)
Understanding variable importances in forests of randomized trees
Louppe, Gilles; Wehenkel, Louis; Sutera, Antonio et al.
2013In Advances in Neural Information Processing Systems 26
Peer reviewed
 

Files


Full Text
louppe13.pdf
Author preprint (324.81 kB)
Main article
Download
Full Text Parts
louppe13-suppl.pdf
Author preprint (233.92 kB)
Supplementary materials
Download
Annexes
poster.pdf
Publisher postprint (352.54 kB)
Poster
Download
slides.pdf
Publisher postprint (115.92 kB)
Spotlight
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
machine learning; random forest; variable importances
Abstract :
[en] Despite growing interest and practical use in various scientific areas, variable importances derived from tree-based ensemble methods are not well understood from a theoretical point of view. In this work we characterize the Mean Decrease Impurity (MDI) variable importances as measured by an ensemble of totally randomized trees in asymptotic sample and ensemble size conditions. We derive a three-level decomposition of the information jointly provided by all input variables about the output in terms of i) the MDI importance of each input variable, ii) the degree of interaction of a given input variable with the other input variables, iii) the different interaction terms of a given degree. We then show that this MDI importance of a variable is equal to zero if and only if the variable is irrelevant and that the MDI importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. We illustrate these properties on a simple example and discuss how they may change in the case of non-totally randomized trees such as Random Forests and Extra-Trees.
Disciplines :
Computer science
Author, co-author :
Louppe, Gilles  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Wehenkel, Louis  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Sutera, Antonio ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Geurts, Pierre  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Understanding variable importances in forests of randomized trees
Publication date :
December 2013
Event name :
Neural Information Processing Systems Conference 2013
Event place :
Lake Tahoe, United States
Event date :
December 5-10 2013
Audience :
International
Main work title :
Advances in Neural Information Processing Systems 26
Peer reviewed :
Peer reviewed
Commentary :
Demo and source code available at https://github.com/glouppe/paper-variable-importances
Available on ORBi :
since 09 September 2013

Statistics


Number of views
2487 (229 by ULiège)
Number of downloads
5127 (181 by ULiège)

Scopus citations®
 
784
Scopus citations®
without self-citations
773

Bibliography


Similar publications



Contact ORBi