Master’s dissertation (Dissertations and theses)
Characterization of variable importance measures derived from decision trees
Sutera, Antonio
2013
 

Files


Full Text
Sutera13.pdf
Author postprint (1.51 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
machine learning; random forest; variable importances
Abstract :
[en] In the context of machine learning, tree-based ensemble methods are common techniques used for prediction and explanation purposes in many research fields such as genetics for instance. These methods consist in building, by randomization, several decision trees and then aggregating their predictions. From an ensemble of trees, one can derive an importance score for each variable of the problem that assesses its relevance for predicting the output. Although these importance scores have been successfully exploited in many applications, they are not well understood and in particular, they lack a theoretical characterization. In this context, this work is a first step towards providing a better understanding of these measures from a theoretical and an empirical point of view. First, we derive, and verify empirically, an analytical formulation of the importance scores obtained from an ensemble of totally randomized trees in asymptotic conditions (i.e, infinite number of trees and infinite sample size). We then study empirically importance score distributions derived from totally randomized tree ensembles in non asymptotic conditions for several simple input-output models. In particular, we show theoretically and empirically the insensitivity of importance scores with respect to the introduction of irrelevant variables for these simple models. We then evaluate the effect of a reduction of the randomization on importance scores and their distribution. Finally, tree-based importance measures are illustrated on a digit recognition problem.
Disciplines :
Computer science
Author, co-author :
Sutera, Antonio ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Language :
English
Title :
Characterization of variable importance measures derived from decision trees
Alternative titles :
[fr] Caractérisation des mesures d'importance de variables dérivées des arbres de décision
Defense date :
24 June 2013
Number of pages :
110 + 5
Institution :
ULiège - Université de Liège
Degree :
Master ingénieur civil électricien
Promotor :
Wehenkel, Louis  ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
President :
Destiné, Jacques ;  Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore)
Jury member :
Louveaux, Quentin ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Van Steen, Kristel  ;  Université de Liège - ULiège > GIGA > GIGA Medical Genomics - Biostatistics, biomedicine and bioinformatics
Geurts, Pierre ;  Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science
Available on ORBi :
since 15 October 2013

Statistics


Number of views
182 (46 by ULiège)
Number of downloads
1300 (46 by ULiège)

Bibliography


Similar publications



Contact ORBi