Improving the bias/variance tradeoff of decision trees - towards soft tree induction

Geurts, Pierre; Olaru, Cristina; Wehenkel, Louis

Article (Scientific journals)

Geurts, Pierre; Olaru, Cristina; Wehenkel, Louis

2001 • In International Journal of Engineering Intelligent Systems for Electrical Engineering and Communications, 9, p. 195-204

Peer reviewed

Permalink
https://hdl.handle.net/2268/25741

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

Wehenkel-paper.pdf

Publisher postprint (210.35 kB)

Request a copy

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

machine learning

Abstract :

[en] One of the main difﬁculties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting tree is strongly dependent on the random nature of the particular sample used for training. Consequently, these algorithms tend to be suboptimal in terms of accuracy and interpretability. This paper analyses this problem in depth and proposes a new method, relying on threshold softening, able to signiﬁcantly improve the bias/variance tradeoff of decision trees. The algorithm is validated on a number of benchmark problems and its relationship with fuzzy decision tree induction is discussed. This sheds some light on the success of fuzzy decision tree induction and improves our understanding of machine learning, in general.

Disciplines :

Computer science

Author, co-author :

Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Olaru, Cristina; Université de Liège - ULiège > Dép. d'électricité, électronique et informatique > Systèmes et modélisation

Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation

Language :

English

Title :

Improving the bias/variance tradeoff of decision trees - towards soft tree induction

Publication date :

2001

Journal title :

International Journal of Engineering Intelligent Systems for Electrical Engineering and Communications

ISSN :

1472-8915

eISSN :

2753-9806

Volume :

Pages :

195-204

Peer reviewed :

Peer reviewed

Additional URL :

http://www.montefiore.ulg.ac.be/services/stochastic/pubs/2001/GOW01

Available on ORBi :

since 15 October 2009

Statistics

Number of views

230 (5 by ULiège)

Number of downloads

3 (3 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Blake C.L., Merz C.J. UCI repository of machine learning databases1998.
Breiman L. Bagging predictors, Technical report, University of California, Department of Statistics, September; 1994.
Breiman L. Arcing classifiers, Technical report, University of California. Department of Statistics; 1996.
Friedman J.H. (1991) Multivariate adaptive regression splines. Annals of Statistics 19(1).
Friedman J.H. (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1:55-77.
Geurts P. (2000) Some enhancements of decision tree bagging. Proc. of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD-2000) , Lyon, France, September; 136-147.
Geurts P. (2001) Dual perturb and combine algorithm. Proc. of the Eighth International Workshop on Artificial Intelligence and Statistics , Key-West, Florida, January; 196-201.
Geurts P., Wehenkel L. (2000) Investigation and reduction of discretization variance in decision tree induction. Proc. of the 11th European Conference on Machine Learning (ECML-2000) , Barcelona, May; 162-170.
Webb G.I. (2000) Multiboosting: A technique for combining boosting and wagging. Machine Learning , August; 40(2):158-196.
Wehenkel L. (1993) Decision tree pruning using an additive information quality measure. Uncertainty in Intelligent Systems , B. Bouchon-Meunier, L. Valverde, and R. R. Yager, editors, Elsevier; 397-411.
Wehenkel L. Discretization of continuous attributes for supervised learning. Variance evaluation and variance reduction. Proc. of IFSA'97, Int. Fuzzy Systems Assoc. World Congress, Special session on Learning in a fuzzy framework; .

Similar publications

Sorry the service is unavailable at the moment. Please try again later.

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website