[en] In this article, we propose a method for evaluating feature ranking algorithms. A feature ranking algorithm estimates the importance of descriptive features when predicting the target variable, and the proposed method evaluates the correctness of these importance values by computing the error measures of two chains of predictive models. The models in the first chain are built on nested sets of top-ranked features, while the models in the other chain are built on nested sets of bottom ranked features. We investigate which predictive models are appropriate for building these chains, showing empirically that the proposed method gives meaningful results and can detect differences in feature ranking quality. This is first demonstrated on synthetic data, and then on several real-world classification benchmark problems.
Disciplines :
Computer science
Author, co-author :
Slavkov, Ivica; Jozef Stefan Institute, Ljubljana, Slovenia
Petkovic, Matej; Jozef Stefan Institute, Ljubljana, Slovenia
Geurts, Pierre ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Algorith. des syst. en interaction avec le monde physique
Kocev, Dragi; Jozef Stefan Institute, Ljubljana, Slovenia
Dzeroski, Saso; Jozef Stefan Institute, Ljubljana, Slovenia
Language :
English
Title :
Error curves for evaluating the quality of feature rankings
Aha D, Kibler D. 1991. Instance-based learning algorithms. Machine Learning 6:37–66.
Arceo-Vilas A, Fernandez-Lozano C, Pita S, Pértega-Díaz S, Pazos A. 2020. A redundancy-removing feature selection algorithm for nominal data. PeerJ Computer Science 1:e24 DOI 10.7717/peerj-cs.24.
Bakr GH, Hofmann T, Schölkopf B, Smola AJ, Taskar B, Vishwanathan SV. 2007. Predicting structured data. Cambridge: The MIT Press.
Biesiada J, Duch W, Kachel A, Paucha S. 2005. Feature ranking methods based on information entropy with parzen windows. In: International Conference on Research in Electrotechnology and Applied Informatics, Katowice, Poland.
Boucheham A, Batouche M. 2014. Robust biomarker discovery for cancer diagnosis based on meta-ensemble feature selection. In: 2014 Science and Information Conference. 452–560.
Breiman L. 2001. Random forests. Machine Learning 45(1):5–32 DOI 10.1023/A:1010933404324.
Cortes C, Vapnik V. 1995. Support-vector networks. Machine Learning 20(3):273–297.
Duch W, Wieczorek T, Biesiada J. 2004. Comparison of feature ranking methods based on information entropy. In: IEEE International Conference on Neural Networks - Conference Proceedings. Vol. 2. Piscataway: IEEE, 1415–1419.
Džeroski S, Demšar D, Grbović J. 2000. Predicting chemical parameters of river water quality from bioindicator data. Applied Intelligence 13(1):7–17 DOI 10.1023/A:1008323212047.
Džeroski S, Potamias G, Moustakis V, Charissis G. 1997. Automated revision of expert rules for treating acute abdominal pain in children. In: Artificial Intelligence in Medicine - AIME, LNCS 1211, 98–109.
Furlanello C, Serafini M, Merler S, Jurman G. 2003. Entropy-based gene ranking without selection bias for the predictive classification of microarray data. BMC Bioinformatics 4(1):54 DOI 10.1186/1471-2105-4-54.
Guyon I, Weston J, Barnhill S, Vapnik V. 2002. Gene selection for cancer classification using support vector machines. Machine Learning 46(1/3):389–422 DOI 10.1023/A:1012487302797.
Guzmán-Martnez R, Alaiz-Rodrguez R. 2011. Feature selection stability assessment based on the Jensen-Shannon divergence. Lecture Notes in Computer Science 6911:597–612.
Henzgen S, Hüllermeier E. 2015. Weighted rank correlation: a flexible approach based on fuzzy order relations. In: Appice A, Rodrigues PP, Santos Costa V, Gama J, Jorge A, Soares C, eds. Machine Learning and Knowledge Discovery in Databases. Berlin: Springer International Publishing, 422–437.
John GH, Langley P. 1995. Estimating continuous distributions in Bayesian classifiers. In: Proc. Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, CA. Burlington: Morgan Kaufmann, 338–345.
Jong K, Mary J, Cornuéjols A, Marchiori E, Sebag M. 2004. Ensemble feature ranking. In: PKDD - LNCS 2302. 267–278.
Jurman G, Merler S, Barla A, Paoli S, Galea A, Furlanello C. 2008. Algebraic stability indicators for ranked lists in molecular profiling. Bioinformatics 24(2):258–264 DOI 10.1093/bioinformatics/btm550.
Kalousis A, Prados J, Hilario M. 2007. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowledge and Information Systems 12(1):95–116 DOI 10.1007/s10115-006-0040-8.
Khoshgoftaar TM, Fazelpour A, Wang H, Wald R. 2013. A survey of stability analysis of feature subset selection techniques. In: IEEE 14th International Conference on Information Reuse Integration (IRI). Piscataway: IEEE, 424–431.
Lance GN, Williams WT. 1966. Computer programs for hierarchical polythetic classification (‘similarity analyses’). Computer Journal 9(1):60–64 DOI 10.1093/comjnl/9.1.60.
Lance GN, Williams WT. 1967. Mixed-data classificatory programs i-agglomerative systems. Australian Computer Journal 1:15–20.
Li Z, Gu W. 2015. A redundancy-removing feature selection algorithm for nominal data. PeerJ Computer Science 3:e1184 DOI 10.7287/peerj.preprints.1184v1.
Liang J, Yang S, Winstanley A. 2008. Invariant optimal feature selection: a distance discriminant and feature ranking based solution. Pattern Recognition 41(5):1429–1439 DOI 10.1016/j.patcog.2007.10.018.
Liu T, Liu S, Chen Z, Ma W-Y. 2003. An evaluation on feature selection for text clustering. In: Fawcett T, Mishra N, eds. ICML. Menlo Park: The AAAI Press, 488–495.
Mramor M, Leban G, Demšar J, Zupan B. 2007. Visualization-based cancer microarray data classification analysis. Bioinformatics 23(16):2147–2154 DOI 10.1093/bioinformatics/btm312.
Muja M, Lowe DG. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In: Ranchordas A, Araújo H, eds. VISAPP (1). Porto: INSTICC Press, 331–340.
Nardone D, Ciaramella A, Staiano A. 2019. A redundancy-removing feature selection algorithm for nominal data. PeerJ Computer Science 1:e24 DOI 10.7717/peerj-cs.24.
Newman CBD, Merz C. 1998. UCI repository of machine learning databases. Available at http://archive.ics.uci.edu/ml/index.php (accessed 13 December 2015).
Nilsson R, Peña JM, Björkegren J, Tegnér J. 2007. Consistent feature selection for pattern recognition in polynomial time. Journal of Machine Learning Research 8:589–612.
Nogueira S, Sechidis K, Brown G. 2017. On the stability of feature selection algorithms. Journal of Machine Learning Research 18(1):6345–6398.
Paoli S, Jurman G, Albanese D, Merler S, Furlanello C. 2005. Semisupervised profiling of gene expressions and clinical data. In: Proc. Sixth International Conference on Fuzzy Logic and Applications. 284–289.
Quinlan R. 1993. C4.5: programs for machine learning. San Mateo: Morgan Kaufmann Publishers.
Robnik-Šikonja M, Kononenko I. 2003. Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learning 53(1/2):23–69 DOI 10.1023/A:1025667309714.
Saeys Y, Abeel T, De Peer YV. 2008. Robust feature selection using ensemble feature selection techniques. In: Daelemans W, Goethals B, Morik K, eds. Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2008, Lecture Notes in Computer Science. Vol. 5212. Berlin: Springer, 313–325.
Slavkov I, Petković M, Kocev D, Džeroski S. 2018. Quantitative score for assessing the quality of feature rankings. Informatica 42(1):43–52.
Tsang IW, Kwok JT, Cheung P-M. 2005. Core vector machines: fast svm training on very large data sets. Journal of Machine Learning Research 6:363–392.
Verikas A, Gelzinis A, Bacauskiene M. 2011. Mining data with random forests: a survey and results of new tests. Pattern Recognition 44(2):330–349 DOI 10.1016/j.patcog.2010.08.011.
Wang Y, Jha S, Chaudhuri K. 2018. Analyzing the robustness of nearest neighbors to adversarial examples. In: Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR 80. Stockholm, Sweden, 5120–5129.
Xu H, Caramanis C, Mannor S. 2009. Robustness and regularization of support vector machines. Journal of Machine Learning Research 10:1485–1510.