Embedded Feature Generation; Monte Carlo Search; Decision Trees; Random Forests; Tree Boosting
Abstract :
[en] Feature generation is the problem of automatically constructing good features for a given target learning problem. While most feature generation algorithms belong either to the filter or to the wrapper approach, this paper focuses on embedded feature generation.
We propose a general scheme to embed feature generation in a wide range of tree-based learning algorithms, including single decision trees, random forests and tree boosting. It is based on the formalization of feature construction as a sequential decision making problem addressed by a tractable Monte Carlo search algorithm coupled with node splitting. This leads to fast algorithms that are applicable to large-scale problems. We empirically analyze the performances of these tree-based learners combined or not with the feature generation capability on several standard datasets.
Disciplines :
Computer science
Author, co-author :
Maes, Francis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Embedding Monte Carlo search of features in tree-based ensemble methods
Publication date :
September 2012
Event name :
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Event place :
Bristol, United Kingdom
Event date :
from 24-09-2012 to 28-09-2012
Audience :
International
Main work title :
Machine Learning and Knowledge Discovery in Data Bases
Breiman, L.: Random forests. Machine Learning 45(1), 5-32 (2001)
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: ICML 2006, pp. 161-168. ACM, New York (2006)
Cazenave, T.: Nested monte-carlo search. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI 2009), pp. 456-461 (2009)
Chaslot, G., Bakkes, S., Szita, I., Spronck, P.: Monte-carlo tree search: A new framework for game ai. In: Darken, C., Mateas, M. (eds.) Proceedings of the Fourth Artificial Intelligence and Interactive Digital Entertainment Conference (2008)
Ekárt, A., Márkus, A.: Using genetic programming and decision trees for generating structural descriptions of four bar mechanisms. Artif. Intell. Eng. Des. Anal. Manuf. 17(3), 205-220 (2003)
Espejo, P.G., Ventura, S., Herrera, F.: A survey on the application of genetic programming to classification. Trans. Sys. Man Cyber. Part C 40(2), 121-144 (2010)
Guo, H., Jack, L.B., Nandi, A.K.: Feature generation using genetic programming with application to fault classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B 35(1), 89-99 (2005)
Kégl, B., Busa-Fekete, R.: Boosting products of base classifiers. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 497-504. ACM, New York (2009)
Krzysztof, K.: Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3(4), 329-343 (2002)
Markovitch, S., Rosenstein, D.: Feature generation using general constructor functions. Machine Learning 49, 59-98 (2002)
Murthy, S.K., Kasif, S., Salzberg, S.: A system for induction of oblique decision trrees. Journal of Artificial Intelligence Research 2, 1-32 (1994)
Ng, A.Y.: Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004. ACM, New York (2004)
Pachet, F., Roy, P.: Analytical features: a knowledge-based approach to audio feature generation. EURASIP J. Audio Speech Music Process., 1-23 (2009)
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A.: Genetic programming for improved data mining: application to the biochemistry of protein interactions. In: Proceedings of the First Annual Conference on Genetic Programming, GECCO 1996, pp. 375-380. MIT Press, Cambridge (1996)
Smith, M.G., Bull, L.: Genetic programming with a genetic algorithm for feature construction and selection. Genetic Programming and Evolvable Machines 6(3), 265-281 (2005)
Yang, D.-S., Rendell, L., Blix, G.: A scheme for feature construction and a comparison of empirical methods. In: Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pp. 699-704. Morgan Kaufmann (1991)