Improvement of randomized ensembles of trees for supervised learning in very high dimension

[en] Tree-based ensemble methods, such as random forests and extremely randomized trees, are methods of choice for handling high dimensional problems. One important drawback of these methods however is the complexity of the models (i.e. the large number and size of trees) they produce to achieve good performances. In this work, several research directions are identified to address this problem. Among those, we have developed the following one. From a tree ensemble, one can extract a set of binary features, each one associated to a leaf or a node of a tree and being true for a given object only if it reaches the corresponding leaf or node when propagated in this tree. Given this representation, the prediction of an ensemble can be simply retrieved by linearly combining these characteristic features with appropriate weights. We apply a linear feature selection method, namely the monotone LASSO, on these features, in order to simplify the tree ensemble. A subtree will then be pruned as soon as the characteristic features corresponding to its constituting nodes are not selected in the linear model. Empirical experiments show that the combination of the monotone LASSO with features extracted from tree ensembles leads at the same time to a drastic reduction of the number of features and can improve the accuracy with respect to unpruned ensembles of trees.

Research Center/Unit :

Systems and Modeling research unit

Disciplines :

Electrical & electronics engineering
Computer science

Author, co-author :

Joly, Arnaud ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Improvement of randomized ensembles of trees for supervised learning in very high dimension

Alternative titles :

[fr] Amélioration des ensemble d'arbres aléatoire pour de l'apprentissage supervisé en très haute dimension

Defense date :

June 2011

Institution :

ULiège - Université de Liège

Degree :

Master en ingénieur civil électricien, à finalité approfondie

Promotor :

Wehenkel, Louis ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Geurts, Pierre ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

President :

Destiné, Jacques ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore)

Jury member :

Louveaux, Quentin ; Université de Liège - ULiège > Montefiore Institute of Electrical Engineering and Computer Science

Van Steen, Kristel ; Université de Liège - ULiège > GIGA > GIGA Medical Genomics - Biostatistics, biomedicine and bioinformatics

Available on ORBi :

since 30 November 2011

Statistics

Number of views

300 (57 by ULiège)

Number of downloads

359 (40 by ULiège)

More statistics

Bibliography

Similar publications

Sorry the service is unavailable at the moment. Please try again later.

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website