[en] One of the main difficulties with standard top down induction of decision trees comes from the high variance of these methods. High variance means that, for a given problem and sample size, the resulting tree is strongly dependent on the random nature of the particular sample used for training. Consequently, these algorithms tend to be suboptimal in terms of accuracy and interpretability. This paper analyses this problem in depth and proposes a new method, relying on threshold softening, able to significantly improve the bias/variance tradeoff of decision trees. The algorithm is validated on a number of benchmark problems and its relationship with fuzzy decision tree induction is discussed. This sheds some light on the success of fuzzy decision tree induction and improves our understanding of machine learning, in general.
Disciplines :
Computer science
Author, co-author :
Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Olaru, Cristina; Université de Liège - ULiège > Dép. d'électricité, électronique et informatique > Systèmes et modélisation
Wehenkel, Louis ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Systèmes et modélisation
Language :
English
Title :
Improving the bias/variance tradeoff of decision trees - towards soft tree induction
Publication date :
2001
Journal title :
International Journal of Engineering Intelligent Systems for Electrical Engineering and Communications
Friedman J.H. (1997) On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1:55-77.
Geurts P. (2000) Some enhancements of decision tree bagging. Proc. of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD-2000) , Lyon, France, September; 136-147.
Geurts P. (2001) Dual perturb and combine algorithm. Proc. of the Eighth International Workshop on Artificial Intelligence and Statistics , Key-West, Florida, January; 196-201.
Geurts P., Wehenkel L. (2000) Investigation and reduction of discretization variance in decision tree induction. Proc. of the 11th European Conference on Machine Learning (ECML-2000) , Barcelona, May; 162-170.
Webb G.I. (2000) Multiboosting: A technique for combining boosting and wagging. Machine Learning , August; 40(2):158-196.
Wehenkel L. (1993) Decision tree pruning using an additive information quality measure. Uncertainty in Intelligent Systems , B. Bouchon-Meunier, L. Valverde, and R. R. Yager, editors, Elsevier; 397-411.
Wehenkel L. Discretization of continuous attributes for supervised learning. Variance evaluation and variance reduction. Proc. of IFSA'97, Int. Fuzzy Systems Assoc. World Congress, Special session on Learning in a fuzzy framework; .
Similar publications
Sorry the service is unavailable at the moment. Please try again later.
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.