Ensembles on Random Patches

Louppe, Gilles; Geurts, Pierre

Download

Paper published in a book (Scientific congresses and symposiums)

Ensembles on Random Patches

Louppe, Gilles; Geurts, Pierre

2012 • In Machine Learning and Knowledge Discovery in Databases

Peer reviewed

Permalink
https://hdl.handle.net/2268/130099

Files (4)Send to Details Statistics Bibliography Similar publications

Files

Full Text

glouppe12.pdf

Author preprint (1.22 MB)

Download

Annexes

glouppe12-suppl.pdf

Publisher postprint (103.5 kB)

Supplementary materials

Download

poster.pdf

Publisher postprint (11.46 MB)

Poster presentation

Download

slides.pdf

Publisher postprint (23.48 MB)

Oral presentation

Download

The original publication is available at www.springerlink.com.

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

ensemble methods; large-scale learning; supervised learning

Abstract :

[en] In this paper, we consider supervised learning under the assumption that the available memory is small compared to the dataset size. This general framework is relevant in the context of big data, distributed databases and embedded systems. We investigate a very simple, yet effective, ensemble framework that builds each individual model of the ensemble from a random patch of data obtained by drawing random subsets of both instances and features from the whole dataset. We carry out an extensive and systematic evaluation of this method on 29 datasets, using decision tree-based estimators. With respect to popular ensemble methods, these experiments show that the proposed method provides on par performance in terms of accuracy while simultaneously lowering the memory needs, and attains significantly better performance when memory is severely constrained.

Disciplines :

Computer science

Author, co-author :

Louppe, Gilles ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Ensembles on Random Patches

Publication date :

2012

Event name :

European Conference on Machine Learning (ECML 2012)

Event organizer :

Prof. Peter Flach
Prof. Tijl De Bie
Prof. Nello Cristianini

Event place :

Bristol, United Kingdom

Event date :

From 24/09/2012 to 28/09/2012

Audience :

International

Main work title :

Machine Learning and Knowledge Discovery in Databases

Publisher :

Springer-Verlag, Berlin, Germany

ISBN/EAN :

978-3-642-33459-7

Collection name :

Lecture Notes in Computer Science, Vol. 7523

Peer reviewed :

Peer reviewed

Available on ORBi :

since 06 September 2012

Statistics

Number of views

698 (82 by ULiège)

Number of downloads

1258 (74 by ULiège)

More statistics

Scopus citations^®

132

Scopus citations^®
without self-citations

130

Bibliography

Breiman, L.: Pasting small votes for classification in large databases and on-line. Machine Learning 36(1), 85-103 (1999)
Ho, T.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832-844 (1998)
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees (1984)
Breiman, L.: Bagging predictors. Machine learning 24(2), 123-140 (1996)
Breiman, L.: Random forests. Machine learning 45(1), 5-32 (2001)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 63(1), 3-42 (2006)
Chawla, N.V., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: Learning ensembles from bites: A scalable and accurate approach. J. Mach. Learn. Res. 5, 421-451 (2004)
Basilico, J., Munson, M., Kolda, T., Dixon, K., Kegelmeyer, W.: Comet: A recipe for learning and using large ensembles on massive data. In: IEEE 11th International Conference on Data Mining (ICDM), pp. 41-50. IEEE (2011)
Panov, P., Džeroski, S.: Combining Bagging and Random Subspaces to Create Better Ensembles. In: Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds.) IDA 2007. LNCS, vol. 4723, pp. 118-129. Springer, Heidelberg (2007)
Pedregosa, F., et al.: Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12, 2825-2830 (2011)
Frank, A., Asuncion, A.: UCI machine learning repository (2010)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research 7, 1-30 (2006)
Zinkevich, M., Weimer, M., Smola, A., Li, L.: Parallelized stochastic gradient descent. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 2595-2603 (2010)

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website