Ensemble of randomized trees; Pruning; L1-norm regularization; LASSO; Supervised learning; Machine Learning; Randomization; Model reduction; Decision tree
Résumé :
[en] Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.
Centre de recherche :
Système et modélisation GIGA‐R - Giga‐Research - ULiège
Joly, Arnaud ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Schnitzler, François ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Langue du document :
Anglais
Titre :
L1-based compression of random forest models
Date de publication/diffusion :
avril 2012
Nom de la manifestation :
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning
Organisateur de la manifestation :
Michel Verleysen
Lieu de la manifestation :
Bruges, Belgique
Date de la manifestation :
25 - 27 April 2012
Manifestation à portée :
International
Titre de l'ouvrage principal :
20th European Symposium on Artificial Neural Networks
Peer reviewed :
Peer reviewed
Organisme subsidiant :
FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture [BE] Biomagnet IUAP network of the Belgian Science Policy Office Pascal2 network of excellence of the EC
L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. 1
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3–42, 2006. 1, 2
P. Geurts. Some enhancements of decision tree bagging. Principles of Data Mining and Knowledge Discovery, pages 141–148, 2000. 1
N Meinshausen. Node harvest. Ann. Appl. Stat., 4(4):2049–2072, 2010. 1
J.H. Friedman and B.E. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916–954, 2008. 1
N. Meinshausen. Forest garrote. Electron. J. Statist., 3:1288–1304, 2009. 1
Simon Bernard, Laurent Heutte, and Sébastien Adam. On the selection of decision trees in Random Forests. In Proceedings of the International Joint Conference on Neural Networks, pages 302–307, France, 2009. 1
G Martínez-Muñoz, D Hernández-Lobato, and A Suárez. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell., 31:245–259, February 2009. 1
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. 1, 2
T. Hastie, J. Taylor, R. Tibshirani, and G. Walther. Forward stagewise regression and the monotone lasso. Electronic Journal of Statistics, 1:1–29, 2007. 2
J.H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, pages 1–67, 1991. 3
L. Breiman. Bias, variance, and arcing classifiers. Statistics, 1996. 3