Paper published in a book (Scientific congresses and symposiums)
Random Subspace with Trees for Feature Selection Under Memory Constraints
Sutera, Antonio; Châtel, Célia; Louppe, Gilles et al.
2018In Storkey, Amos; Perez-Cruz, Fernando (Eds.) Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
Peer reviewed
 

Files


Full Text
sutera18a.pdf
Publisher postprint (1.29 MB)
Download
Annexes
sutera18a-supp.pdf
Publisher postprint (421 kB)
Supplementary materials
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
machine learning; random forest; variable importances; random subspace; feature selection
Abstract :
[en] Dealing with datasets of very high dimension is a major challenge in machine learning. In this paper, we consider the problem of feature selection in applications where the memory is not large enough to contain all features. In this setting, we propose a novel tree-based feature selection approach that builds a sequence of randomized trees on small subsamples of variables mixing both variables already identified as relevant by previous models and variables randomly selected among the other variables. As our main contribution, we provide an in-depth theoretical analysis of this method in infinite sample setting. In particular, we study its soundness with respect to common definitions of feature relevance and its convergence speed under various variable dependance scenarios. We also provide some preliminary empirical results highlighting the potential of the approach.
Disciplines :
Electrical & electronics engineering
Author, co-author :
Sutera, Antonio ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Châtel, Célia
Louppe, Gilles  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Big Data
Wehenkel, Louis  ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Geurts, Pierre ;  Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Language :
English
Title :
Random Subspace with Trees for Feature Selection Under Memory Constraints
Publication date :
2018
Event name :
The 21st International Conference on Artificial Intelligence and Statistics
Event place :
Playa Blanca, Spain
Event date :
du 9 au 11 avril 2018
Audience :
International
Main work title :
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
Editor :
Storkey, Amos
Perez-Cruz, Fernando
Publisher :
PMLR, Playa Blanca, Spain
Collection name :
Proceedings of Machine Learning Research
Pages :
929-937
Peer reviewed :
Peer reviewed
Tags :
CÉCI : Consortium des Équipements de Calcul Intensif
Funders :
CÉCI - Consortium des Équipements de Calcul Intensif [BE]
Available on ORBi :
since 29 June 2018

Statistics


Number of views
113 (30 by ULiège)
Number of downloads
196 (21 by ULiège)

Scopus citations®
 
2
Scopus citations®
without self-citations
1

Bibliography


Similar publications



Contact ORBi