Ensemble of randomized trees; Pruning; L1-norm regularization; LASSO; Supervised learning; Machine Learning; Randomization; Model reduction; Decision tree
Abstract :
[en] Random forests are effective supervised learning methods applicable to large-scale datasets. However, the space complexity of tree ensembles, in terms of their total number of nodes, is often prohibitive, specially in the context of problems with very high-dimensional input spaces. We propose to study their compressibility by applying a L1-based regularization to the set of indicator functions defined by all their nodes. We show experimentally that preserving or even improving the model accuracy while significantly reducing its space complexity is indeed possible.
Research center :
Systèmes et modélisation GIGA‐R - Giga‐Research - ULiège
FRIA - Fonds pour la Formation à la Recherche dans l'Industrie et dans l'Agriculture [BE] BELSPO - Belgian Science Policy Office [BE] EU - European Union [BE]
Funding text :
F. Schnitzler is supported by a F.R.I.A. scholarship. This work was funded by the Biomagnet IUAP network of the Belgian Science Policy Office and the Pascal2 network of excellence of the EC.
Commentary :
An extended abstract presenting the article "Joly, A., Schnitzler, F., Geurts, P., & Wehenkel, L. (2012). L1-based compression of random forest models. 20th European Symposium on Artificial Neural Networks." which leads also to an oral presentation.