L1-based compression of random forest models

Ensemble of randomized trees; Pruning; L1-norm regularization; LASSO; Supervised learning; Machine Learning; Randomization; Model reduction; Decision tree

Résumé :

[en] Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.

Centre de recherche :

Système et modélisation
GIGA‐R - Giga‐Research - ULiège

Disciplines :

Ingénierie électrique & électronique
Sciences informatiques

Auteur, co-auteur :

Joly, Arnaud ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Schnitzler, François ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Langue du document :

Anglais

Titre :

L1-based compression of random forest models

Date de publication/diffusion :

avril 2012

Nom de la manifestation :

European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning

Organisateur de la manifestation :

Michel Verleysen

Lieu de la manifestation :

Bruges, Belgique

Date de la manifestation :

25 - 27 April 2012

Manifestation à portée :

International

Titre de l'ouvrage principal :

20th European Symposium on Artificial Neural Networks

Peer reviewed :

Peer reviewed

Organisme subsidiant :

FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture [BE]
Biomagnet IUAP network of the Belgian Science Policy Office
Pascal2 network of excellence of the EC

Disponible sur ORBi :

depuis le 25 février 2012

Statistiques

Nombre de vues

678 (dont 82 ULiège)

Nombre de téléchargements

463 (dont 33 ULiège)

Voir plus de statistiques

citations Scopus^®

citations Scopus^®
sans auto-citations

Bibliographie

L. Breiman. Random forests. Machine learning, 45(1):5–32, 2001. 1
P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3–42, 2006. 1, 2
P. Geurts. Some enhancements of decision tree bagging. Principles of Data Mining and Knowledge Discovery, pages 141–148, 2000. 1
N Meinshausen. Node harvest. Ann. Appl. Stat., 4(4):2049–2072, 2010. 1
J.H. Friedman and B.E. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916–954, 2008. 1
N. Meinshausen. Forest garrote. Electron. J. Statist., 3:1288–1304, 2009. 1
Simon Bernard, Laurent Heutte, and Sébastien Adam. On the selection of decision trees in Random Forests. In Proceedings of the International Joint Conference on Neural Networks, pages 302–307, France, 2009. 1
G Martínez-Muñoz, D Hernández-Lobato, and A Suárez. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell., 31:245–259, February 2009. 1
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996. 1, 2
T. Hastie, J. Taylor, R. Tibshirani, and G. Walther. Forward stagewise regression and the monotone lasso. Electronic Journal of Statistics, 1:1–29, 2007. 2
J.H. Friedman. Multivariate adaptive regression splines. The Annals of Statistics, pages 1–67, 1991. 3
L. Breiman. Bias, variance, and arcing classifiers. Statistics, 1996. 3
Intel AA&YA. Manufacturing data: Semiconductor tool fault isolation, 11 2008. 3
E. J. Candés and M. B. Wakin. An introduction to compressive sampling. Signal Processing Magazine, IEEE, 25(2):21–30, 2008. 6