Accelerating Random Forests in Scikit-Learn

Louppe, Gilles

Download

Unpublished conference/Abstract (Scientific congresses and symposiums)

Accelerating Random Forests in Scikit-Learn

Louppe, Gilles

2014 • EuroScipy 2014

Permalink
https://hdl.handle.net/2268/171887

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

slides.pdf

Author preprint (1.99 MB)

Download

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

machine learning; scikit-learn; python; random forests

Abstract :

[en] Random Forests are without contest one of the most robust, accurate and versatile tools for solving machine learning tasks. Implementing this algorithm properly and efficiently remains however a challenging task involving issues that are easily overlooked if not considered with care. In this talk, we present the Random Forests implementation developed within the Scikit-Learn machine learning library. In particular, we describe the iterative team efforts that led us to gradually improve our codebase and eventually make Scikit-Learn's Random Forests one of the most efficient implementations in the scientific ecosystem, across all libraries and programming languages. Algorithmic and technical optimizations that have made this possible include: - An efficient formulation of the decision tree algorithm, tailored for Random Forests; - Cythonization of the tree induction algorithm; - CPU cache optimizations, through low-level organization of data into contiguous memory blocks; - Efficient multi-threading through GIL-free routines; - A dedicated sorting procedure, taking into account the properties of data; - Shared pre-computations whenever critical. Overall, we believe that lessons learned from this case study extend to a broad range of scientific applications and may be of interest to anybody doing data analysis in Python.

Disciplines :

Computer science

Author, co-author :

Louppe, Gilles ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Accelerating Random Forests in Scikit-Learn

Publication date :

29 August 2014

Event name :

EuroScipy 2014

Event place :

Cambridge, United Kingdom

Event date :

from 27-08-2014 to 30-08-2014

Audience :

International

Available on ORBi :

since 09 September 2014

Statistics

Number of views

335 (10 by ULiège)

Number of downloads

2257 (6 by ULiège)

More statistics