[en] Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in LDA, reduces the number of parameters to estimate. This rather strong assumption is however rarely verified in practice. Regularized discriminant techniques that are computable in high-dimension and cover the path between the two extremes QDA and LDA have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets.
In this paper, we propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariance matrices. Our methodology results in a family of discriminant methods that (i) are robust against outlying cells, (ii) cover the gap between LDA and QDA and (iii) are computable in high-dimension. The good performance of the new methods is illustrated through simulated and real data examples. As a by-product, visual tools are provided for the detection of outliers.
Disciplines :
Mathematics
Author, co-author :
Aerts, Stéphanie ; Université de Liège > HEC Liège : UER > UER Opérations : Informatique de gestion
Wilms, Ines; Katholieke Universiteit Leuven - KUL > Faculty of Economics and Business > Leuven Statistics Research Centre (LStat)
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.
Bibliography
C. Agostinelli et al., Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination, TEST 24 (2015), 441–461. MR3377890
F. A. Alqallaf et al., Scalable robust covariance and correlation estimates for data mining, Proc. Eighth ACM SIGKDD Internat. Conf. Knowledge Discovery and Data Mining, 2002, pp. 14–23.
F. A. Alqallaf et al., Propagation of outliers in multivariate data, Ann. Stat. 37 (2009), 311–331. MR2488353
C. Croux and C. Dehon, Robust linear discriminant analysis using S-estimators, Can. J. Stat. 29 (2001), 473–493. MR1872648
C. Croux and V. Öllerer, Robust high-dimensional precision matrix estimation, In Modern Multivariate and Robust Methods, Springer, Cham, 2015. MR3444335
P. Danaher, JGL: Performs the joint graphical lasso for sparse inverse covariance estimation on multiple classes, 2013, available at https://CRAN.R-project.org/package=JGL. R package version 2.3
P. Danaher, P. Wang, and D. Witten, The joint graphical lasso for inverse covariance estimation across multiple classes, J. Roy. Stat. Soc. Ser. B 76 (2014), 373–397.
P. Filzmoser and H. Fritz, pcaPP: Robust PCA by projection pursuit, 2006, available at https://CRAN.R-project.org/package=pcaPP. R package version 1.0
P. Filzmoser, R. Maronna, and M. Werner, Outlier identification in high dimension, Comput. Stat. Data Anal. 52 (2008), 1694–1711. MR2422764
P. Filzmoser, K. Hron, and M. Templ, Discriminant analysis for compositional data and robust parameter estimation, Comput. Stat. 27 (2012), 585–604. MR3041848
J. Friedman, T. Hastie, and R. Tibshirani, Sparse inverse covariance estimation with the graphical lasso, Biostatistics 9 (2008), 432–441.
J. H. Friedman, Regularized discriminant analysis, J. Am. Stat. Assoc. 84 (1989), 165–175. MR999675
C. Gao et al., Estimation of multiple networks in Gaussian mixture models, Electron. J. Stat. 10 (2016), 1133–1154. MR3499523
T. Hastie, R. Tishirani, and J. Friedman, The Elements of Statistical Learning, Data Mining, Inference and Prediction, 2nd ed., Springer Verlag, New York, 2009. MR2722294
M. Hubert and S. Engelen, Robust PCA and classification in biosciences, Bioinformatics 20 (2004), 1728–1736.
M. Hubert and K. Van Driessen, Fast and robust discriminant analysis, Comput. Stat. Data Anal. 45 (2004), 301–320. MR2045634
M. Hubert and K. Van Driessen, Fast and robust discriminant analysis, Comput. Stat. Data Anal. 45 (2004), 301–320.
R. A. Maronna, R. D. Martin, and V. J. Yohai, Robust Statistics: Theory and Methods, John Wiley & Sons, Hoboken, NJ, 2006. MR2238141
B. Price, C. Geyer, and A. Rothman, Ridge fusion in statistical learning, J. Comput. Graph. Stat. 24 (2015), 439–454. MR3357389
J. Raymaekers, P. Rousseeuw, and W. Van den Bossche. cellWise: Analyzing data with cellwise outliers, 2016, available at https://CRAN.R-project.org/package=cellWise. R package version 1.0.0
P. Rousseeuw and C. Croux, Alternatives to the median absolute deviation, J. Am. Stat. Assoc. 88 (1993), 1273–1283. MR1245360
P. Rousseeuw and W. Van den Bossche. Detecting deviating cells. Technometrics, in press.
P. J. Rousseeuw and A. M. Leroy, Robust Regression and Outlier Detection, John Wiley and Sons, New York, 1987. MR0914792
G. Tarr, S. Müller, and N. C. Weber, Robust estimation of precision matrices under cellwise contamination, Comput. Stat. Data Anal. 93 (2016), 404–420. MR3406222
V. Todorov, rrcovHD : Robust multivariate methods for high dimensional data, 2016, available at https://CRAN.R-project.org/package=rrcovHD. R package version 0.2-5
D. Tyler, A note on multivariate location and scatter statistics for sparse data sets, Stat. Probab. Lett. 80 (2010), 1409–1413. MR2669781
S. Van Aelst, Stahel-Donoho estimation for high dimensional data, Int. J. Comput. Math. 93 (2016), 628–639. MR3473777
K. Vanden Branden and M. Hubert, Chemom. Intel. Lab. Syst. 79 (2005), 10–21.
B. Xu et al., Graphical lasso quadratic discriminant function and its application to character recognition, Neurocomputing 129 (2014), 33–40.
T. Yuan and J. Wang, A coordinate descent algorithm for sparse positive definite matrix estimation, Stat. Anal. Data Min. 6 (2013), 431–442. MR3111514
T. Zhao et al., The huge package for high-dimensional undirected graph estimation in R, J. Mach. Learn. Res. 13 (2012), 1059–1062.
A. Zimek, E. Schubert, and H.-P. Kriegel, A survey on unsupervised outlier detection in high dimensional numerical data, Stat. Anal. Data Min. 5 (2012), 363–476. MR2979735
Similar publications
Sorry the service is unavailable at the moment. Please try again later.
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.