Reference : Accelerating Random Forests in Scikit-Learn
Scientific congresses and symposiums : Unpublished conference/Abstract
Engineering, computing & technology : Computer science
Accelerating Random Forests in Scikit-Learn
Louppe, Gilles mailto [Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation >]
EuroScipy 2014
from 27-08-2014 to 30-08-2014
[en] machine learning ; scikit-learn ; python ; random forests
[en] Random Forests are without contest one of the most robust, accurate and versatile tools for solving machine learning tasks. Implementing this algorithm properly and efficiently remains however a challenging task involving issues that are easily overlooked if not considered with care. In this talk, we present the Random Forests implementation developed within the Scikit-Learn machine learning library. In particular, we describe the iterative team efforts that led us to gradually improve our codebase and eventually make Scikit-Learn's Random Forests one of the most efficient implementations in the scientific ecosystem, across all libraries and programming languages. Algorithmic and technical optimizations that have made this possible include:

- An efficient formulation of the decision tree algorithm, tailored for Random Forests;
- Cythonization of the tree induction algorithm;
- CPU cache optimizations, through low-level organization of data into contiguous memory blocks;
- Efficient multi-threading through GIL-free routines;
- A dedicated sorting procedure, taking into account the properties of data;
- Shared pre-computations whenever critical.

Overall, we believe that lessons learned from this case study extend to a broad range of scientific applications and may be of interest to anybody doing data analysis in Python.
Researchers ; Professionals ; Students

File(s) associated to this reference

Fulltext file(s):

Open access
slides.pdfAuthor preprint1.94 MBView/Open

Bookmark and Share SFX Query

All documents in ORBi are protected by a user license.