Compression; Prepruning; Random Forest; Extremely randomized trees; Iterative model; stagewise
Abstract :
[en] Tree-based ensemble models are heavy memory- wise. An undesired state of affairs consider- ing nowadays datasets, memory-constrained environment and fitting/prediction times. In this paper, we propose the Globally Induced Forest (GIF) to remedy this problem. GIF is a fast prepruning approach to build lightweight ensembles by iteratively deepening the current forest. It mixes local and global optimizations to produce accurate predictions under memory constraints in reasonable time. We show that the proposed method is more than competitive with standard tree-based ensembles under corresponding constraints, and can sometimes even surpass much larger models.
Disciplines :
Computer science
Author, co-author :
Begon, Jean-Michel ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Joly, Arnaud ; Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Language :
English
Title :
Globally Induced Forest: A Prepruning Compression Scheme
Alternative titles :
[fr] Globally Induced Forest: une méthode d'élagage
Publication date :
2017
Event name :
34th International Conference on Machine Learning
Event place :
Sydney, Australia
Event date :
du 7 aout 2017 au 11 aout 2017
Audience :
International
Journal title :
Proceedings of Machine Learning Research
eISSN :
2640-3498
Publisher :
Microtome Publishing, Brookline, United States - Massachusetts
Special issue title :
Proceedings of the 34th International Conference on Machine Learning
Volume :
70
Pages :
420-428
Peer reviewed :
Peer Reviewed verified by ORBi
Tags :
CÉCI : Consortium des Équipements de Calcul Intensif
Breiman, Leo. Pasting small votes for classification in large databases and on-line. Machine Learning, 36(1-2):85-103, 1999.
Breiman, Leo. Random forests. Machine learning, 45(1):5-32, 2001.
De Vleeschouwcr, Christophe, Legrand, Anthony, Jacques, Laurent, and Hebert, Martial. Mitigating memory requirements for random trees/ferns. In Image Processing (ICIP), 2015 IEEE International Conference on, pp. 227-231. IEEE, 2015.
Domingos, Pedro. Knowledge acquisition from examples via multiple models. In Machine learning-international workshop then conference, pp. 98-106. Morgan Kaufmann publishers, INC., 1997.
Elisha, Oren and Dekel, Shai. Wavelet decompositions of random forests-smoothness analysis, sparse approximation and applications. Journal of Machine Learning Research, 17(198):1-38, 2016.
Freund, Yoav and Schapire, Robert E. A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory, pp. 23-37. Springer, 1995.
Friedman, Jerome, Hastie, Trevor, and Tibshirani, Robert. The elements of statistical learning, volume 1. Springer series in statistics Springer, Berlin, 2001.
Friedman, Jerome H. Greedy function approximation: a gradient boosting machine. Annals of statistics, pp. 1189-1232, 2001.
Johnson, Rie and Zhang, Tong. Learning nonlinear functions using regularized greedy forest. IEEE transactions on pattern analysis and machine intelligence, 36(5):942-954, 2014.
Joly, Arnaud, Schnitzler, François, Geurts, Pierre, and Wehenkel, Louis. LI-based compression of random forest models. In 20th European Symposium on Artificial Neural Networks, 2012.
Meinshausen, Nicolai et al. Forest garrote. Electronic Journal of Statistics, 3:1288-1304, 2009.
Menke, Joshua E and Martinez, Tony R. Artificial neural network reduction through oracle learning. Intelligent Data Analysis, 13(1):135-149, 2009.
Peterson, Adam H and Martinez, Tony R. Reducing decision tree ensemble size using parallel decision dags. International Journal on Artificial Intelligence Tools, 18(04):613-620, 2009.
Ren, Shaoqing, Cao, Xudong, Wei, Yichen, and Sun, Jian. Global refinement of random forest. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 723-730, 2015.
Rokach, Lior. Decision forest: Twenty years of research. Information Fusion, 27:111-125, 2016.
Shotton, Jamie, Sharp, Toby, Kohli, Pushmeet, Nowozin, Sebastian, Winn, John, and Criminisi, Antonio. Decision jungles: Compact and rich models for classification. In Advances in Neural Information Processing Systems, pp. 234-242, 2013.
Tsoumakas, Grigorios, Pártalas, Ioannis, and Vlahavas, Ioannis. A taxonomy and short review of ensemble selection. In ECAI2008, workshop on supervised and unsupervised ensemble methods and their applications, pp. 41-46, 2008.
Vens, Celine and Costa, Fabrizio. Random forest based feature induction. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pp. 744-753. IEEE, 2011.
Zhu, Ji, Zou, Hui, Rosset, Sanaron, and Hastie, Trevor. Multi-class adaboost. Statistics and its Interface, 2(3):349-360, 2009.