Multivariate relative dispersion Aerts, Stéphanie ; Haesbroeck, Gentiane ; Ruwet, Christel Conference (2020, February 26) In the univariate context, coefficients of variation (CVs) are widely used to compare the relative dispersion of a variable across several populations. When the comparison is based on p characteristics ... [more ▼] In the univariate context, coefficients of variation (CVs) are widely used to compare the relative dispersion of a variable across several populations. When the comparison is based on p characteristics, however, side-by-side comparison of marginal CVs may lead to contradictory results. In response, several proposals for multivariate coefficients of variation (MCVs) have been introduced and used in the literature. These are measures of relative dispersion that summarize the multivariate information into one single index. Depending on the context (flat data, invariance requirement,...), one of these proposals might be more appropriate. Whichever definition is chosen however, in practice, all coefficients can be estimated by plugging any pair of location and covariance estimators in their defintion. In this talk, some of the properties (bias, robustness,...) of the resulting estimators will be reviewed and discussed. The construction of confidence intervals and tests for comparing the relative dispersion in multivariate data will also be considered. Examples related to finance or analytical chemistry will be used throughout the talk as illustration. [less ▲] Detailed reference viewed: 57 (6 ULiège)Predictive Maintenance of Technical Faults in Aircraft Peters, Florian ; Aerts, Stéphanie ; Schyns, Michael Conference (2020, January 30) A key issue for handlers in the air cargo industry is arrival delays due to aircraft maintenance. This work focuses on a particular delay caused by technical faults called technical delays. Using real ... [more ▼] A key issue for handlers in the air cargo industry is arrival delays due to aircraft maintenance. This work focuses on a particular delay caused by technical faults called technical delays. Using real data from a cargo handler company, different classification models that can predict technical delay occurrence are compared. A new decision tree extension is also proposed based on a study by Hoffait & Schyns (2017). The final results present a good starting point for future research. [less ▲] Detailed reference viewed: 98 (17 ULiège)Predictive Maintenance of Technical Faults Using Machine Learning : A Case Study in Air Transport Peters, Florian ; Aerts, Stéphanie ; Schyns, Michael E-print/Working paper (2020) Detailed reference viewed: 37 (3 ULiège)Completion/Drop out and Time to Degree in the PhD track: A Conceptual Literature Review Aerts, Stéphanie ; Haesbroeck, Gentiane ; Klenkenberg, Sophie E-print/Working paper (2020) Understanding the attrition among PhD candidates and modelling the mean or median time to degree according to certain factors have been at the center of the research interest in higher education for a ... [more ▼] Understanding the attrition among PhD candidates and modelling the mean or median time to degree according to certain factors have been at the center of the research interest in higher education for a long time. This paper presents a conceptual literature review divided into two parts. First, the multitude of factors used to explain the doctoral outcome is presented and results concerning their impact are summarized. We also propose explanations for potentially controversial findings and advice on the proper use of these factors. Then, an overview of the most common statistical techniques employed is displayed. The authors argue that some techniques are more adapted than others to deal with such data and the issues of interest and they give recommendations about the most appropriate techniques. They also highlight some of the key points that are essential to consider in order to conduct an adequate statistical analysis when dealing with PhD data. An overall conclusion is that there is much variability in the factors and their definitions, the same comment holding for the techniques. This explains why studies end up with different (sometimes controversial) results. [less ▲] Detailed reference viewed: 89 (20 ULiège)Cellwise robust regularized precision matrices for discriminant analysis ; Aerts, Stéphanie Conference (2019, December) Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more ... [more ▼] Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in LDA, reduces the number of parameters to estimate. This rather strong assumption is however rarely verified in practice. Regularized discriminant techniques that are computable in high-dimension and cover the path between the two extremes QDA and LDA have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets. We propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariance matrices. Our methodology results in a family of discriminant methods that (i) are robust against outlying cells, (ii) provide, as a by-product, a way to detect outliers, (iii) cover the path between LDA and QDA, and (iv) are computable in high-dimensions. [less ▲] Detailed reference viewed: 30 (1 ULiège)Regularized methods in Statistics Aerts, Stéphanie Scientific conference (2018, September 06) Detailed reference viewed: 22 (3 ULiège)Regularized discriminant analysis : a cellwise robust approach Aerts, Stéphanie Conference (2018, July 03) Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more ... [more ▼] Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in LDA, reduces the number of parameters to estimate. This rather strong assumption is however rarely verified in practice. Regularized discriminant techniques that are computable in high-dimension and cover the path between the two extremes QDA and LDA have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets. In this talk, we propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariance matrices. Our methodology results in a family of discriminant methods that (i) are robust against outlying cells, (ii) cover the gap between LDA and QDA and (iii) are computable in high-dimension. The good performance of the new methods is illustrated through simulated and real data examples. [less ▲] Detailed reference viewed: 36 (4 ULiège)Robust Multivariate Dispersion Measures Aerts, Stéphanie Doctoral thesis (2018) Detailed reference viewed: 93 (37 ULiège)Distribution under elliptical symmetry of a distance-based multivariate coefficient of variation Aerts, Stéphanie ; Haesbroeck, Gentiane ; Ruwet, Christel in Statistical Papers (2018) Detailed reference viewed: 148 (63 ULiège)Regularized Discriminant Analysis in Presence of Cellwise Contamination Aerts, Stéphanie ; Conference (2017, August 01) Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more ... [more ▼] Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in LDA, reduces the number of parameters to estimate. This rather strong assumption is however rarely verified in practice. Regularized discriminant techniques that are computable in high-dimension and cover the path between the two extremes QDA and LDA have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets. We propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariance matrices. Our methodology results in a family of discriminant methods that are robust against outlying cells, cover the gap between LDA and QDA and are computable in high-dimension. [less ▲] Detailed reference viewed: 26 (2 ULiège)Cellwise robust regularized discriminant analysis ; Aerts, Stéphanie Conference (2017, July) Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in LDA, reduces the number of parameters to estimate. This rather strong assumption is however rarely verified in practice. Regularized discriminant techniques that are computable in high-dimension and cover the path between the two extremes QDA and LDA have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets. We propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariance matrices. Our methodology results in a family of discriminant methods that (i) are robust against outlying cells, (ii) provide, as a by-product, a way to detect outliers, (iii) cover the path between LDA and QDA, and (iv) are computable in high-dimensions. [less ▲] Detailed reference viewed: 30 (1 ULiège)Cellwise robust regularized discriminant analysis Aerts, Stéphanie ; Conference (2017, June 02) Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classiﬁcation rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more ... [more ▼] Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classiﬁcation rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in LDA, reduces the number of parameters to estimate. This rather strong assumption is however rarely veriﬁed in practice. Regularized discriminant techniques that are computable in high-dimension and cover the path between the two extremes QDA and LDA have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets. We propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariancematrices. Ourmethodologyresultsinafamilyofdiscriminantmethods that are robust against outlying cells, cover the gap between LDA and QDA and are computable in high-dimension. [less ▲] Detailed reference viewed: 29 (3 ULiège)Résultats des enquêtes "Attrait des Sciences" (années académiques 2015–2016 et 2016–2017) Aerts, Stéphanie ; Ernst, Marie Report (2017) Lors des rentrées académiques 2015-2016 et 2016-2017, une enquête a été envoyée aux étudiants primants inscrits en Faculté des Sciences. Cette enquête interroge les étudiants sur leur parcours secondaire ... [more ▼] Lors des rentrées académiques 2015-2016 et 2016-2017, une enquête a été envoyée aux étudiants primants inscrits en Faculté des Sciences. Cette enquête interroge les étudiants sur leur parcours secondaire, leur choix d'étude, leur avenir professionnel et leurs aptitudes. Ce rapport consiste en une analyse descriptive et exploratoire des résultats des deux enquêtes. [less ▲] Detailed reference viewed: 52 (24 ULiège)Cellwise Robust regularized discriminant analysis Aerts, Stéphanie ; in Statistical Analysis and Data Mining (2017), 10 Quadratic and Linear Discriminant Analysis (QDA/LDA) are the most often applied classification rules under normality. In QDA, a separate covariance matrix is estimated for each group. If there are more variables than observations in the groups, the usual estimates are singular and cannot be used anymore. Assuming homoscedasticity, as in LDA, reduces the number of parameters to estimate. This rather strong assumption is however rarely verified in practice. Regularized discriminant techniques that are computable in high-dimension and cover the path between the two extremes QDA and LDA have been proposed in the literature. However, these procedures rely on sample covariance matrices. As such, they become inappropriate in presence of cellwise outliers, a type of outliers that is very likely to occur in high-dimensional datasets. In this paper, we propose cellwise robust counterparts of these regularized discriminant techniques by inserting cellwise robust covariance matrices. Our methodology results in a family of discriminant methods that (i) are robust against outlying cells, (ii) cover the gap between LDA and QDA and (iii) are computable in high-dimension. The good performance of the new methods is illustrated through simulated and real data examples. As a by-product, visual tools are provided for the detection of outliers. [less ▲] Detailed reference viewed: 58 (22 ULiège)Robust asymptotic tests for the equality of multivariate coefficients of variation Aerts, Stéphanie ; Haesbroeck, Gentiane in TEST (2017), 26(1), 163-187 In order to easily compare several populations on the basis of more than one feature, multivariate coefficients of variation (MCV) may be used as they allow to summarize relative dispersion in a single ... [more ▼] In order to easily compare several populations on the basis of more than one feature, multivariate coefficients of variation (MCV) may be used as they allow to summarize relative dispersion in a single index. However, up to date, no test of equality of one or more MCV's has been developed in the literature. In this paper, several classical and robust Wald type tests are proposed and studied. The asymptotic distributions of the test statistics are derived under elliptical symmetry, and the asymptotic efficiency of the robust versions is compared to the classical tests. Robustness of the proposed procedures is examined through partial influence functions of the test statistic, as well as by means of power and level influence functions. A simulation study compares the performance of the classical and robust tests under uncontaminated and contaminated schemes, and the difference with the usual covariance homogeneity test is highlighted. As a by-product, these tests may also be considered in the univariate context where they yield procedures that are both robust and easy-to-use. They provide an interesting alternative to the numerous parametric tests of comparison of univariate coefficients of variation existing in the literature, which are, in most cases, unreliable in presence of outliers. The methods are illustrated on a real data set. [less ▲] Detailed reference viewed: 109 (24 ULiège)A full inference toolbox to measure multivariate relative dispersion Aerts, Stéphanie ; Haesbroeck, Gentiane Conference (2016, October 14) In the univariate context, coefficients of variation (CV) are widely used to compare the dispersion of a variable in several populations. When the comparison is based on p characteristics however, side-by ... [more ▼] In the univariate context, coefficients of variation (CV) are widely used to compare the dispersion of a variable in several populations. When the comparison is based on p characteristics however, side-by-side comparison of marginal CV’s may lead to contradictions. In this talk, we present a multivariate coefficient of variation (MCV), defined as the inverse of the Mahalanobis distance between the mean and the origin, whose usefulness is demonstrated in some applications in finance and analytical chemistry. A full inference toolbox is provided for practitioners: several parametric and non-parametric bias-correction methods are suggested and compared, and some exact and approximate confidence intervals are built and analyzed in a simulation study. Finally, in order to meet the practical need to compare MCV’s in K populations, some asymptotic statistical testing procedures are derived, whose finite-sample performance is empirically assessed. Throughout the talk, the robustness of the techniques will be discussed. As a by-product, a test statistic allowing to reliably compare K univariate CV’s even in presence of outliers will be outlined. [less ▲] Detailed reference viewed: 56 (6 ULiège)Robust discriminant analysis based on the joint graphical lasso estimator Aerts, Stéphanie ; ; Poster (2016, October) Linear and Quadratic Discriminant Analysis (LDA/QDA) are the most often applied classification rules under the normality assumption. When there is not enough data, the quadratic rule, which requires the ... [more ▼] Linear and Quadratic Discriminant Analysis (LDA/QDA) are the most often applied classification rules under the normality assumption. When there is not enough data, the quadratic rule, which requires the estimation of one precision matrix in each class, is often replaced by the linear one, based on the homoscedasticity assumption. This strong assumption is however rarely verified in practice and ignores the intrinsic différences between groups that may be of particular interest in the classification context. In this aper, alternatives to the usual maximum likelihood estimates for the precision matrices are proposed that borrow strength across classes while allowing for heterogeneity at the same time. This results in a classifier that is intermediate between QDA and LDA. Moreover, our estimator is sparse: the undesirable effect of uninformative variables is reduced. The performance of the method is illustrated through simulated and real dataset examples. [less ▲] Detailed reference viewed: 56 (3 ULiège)Multivariate coefficients of variation: a full inference toolbox Aerts, Stéphanie ; Haesbroeck, Gentiane Conference (2015, December 13) The univariate coefficient of variation (CV) is a widely used measure to compare the relative dispersion of a variable in several populations. When the comparison is based on $p$ characteristics however ... [more ▼] The univariate coefficient of variation (CV) is a widely used measure to compare the relative dispersion of a variable in several populations. When the comparison is based on $p$ characteristics however, side-by-side comparison of marginal CV's may lead to contradictions. Several multivariate coefficients of variation (MCV) have been introduced and used in the literature but, so far, their properties have not been much studied. Based on one of them, i.e. the inverse of the Mahalanobis distance between the mean and the origin, this talk intends to demonstrate the usefulness of MCV's in several domains (finance and analytical chemistry) as well as provide a complete inference toolbox for practitioners. Some exact and approximate confidence intervals are constructed, whose performance is analyzed through simulations. Several bias-correction methods, either parametric or not, are suggested and compared. Finally, since MCV's are used for comparison purposes, some test statistics are proposed for the homogeneity of MCV's in $K$ populations. Throughout the talk, the robustness of the techniques will be discussed. As a by-product, a test statistic allowing to reliably compare $K$ univariate CV's even in presence of outliers will be outlined. [less ▲] Detailed reference viewed: 48 (3 ULiège)Multivariate coefficients of variation: Comparison and influence functions Aerts, Stéphanie ; Haesbroeck, Gentiane ; Ruwet, Christel in Journal of Multivariate Analysis (2015), 142 In the univariate setting, coeﬃcients of variation are well-known and used to compare the variability of populations characterized by variables expressed in diﬀerent units or having really diﬀerent means ... [more ▼] In the univariate setting, coeﬃcients of variation are well-known and used to compare the variability of populations characterized by variables expressed in diﬀerent units or having really diﬀerent means. When dealing with more than one variable, the use of such a relative dispersion measure is much less common even though several generalizations of the coeﬃcient of variation to the multivariate setting have been introduced in the literature. In this paper, the lack of robustness of the sample versions of the multivariate coeﬃcients of variation (MCV) is illustrated by means of inﬂuence functions and some robust counterparts based either on the Minimum Covariance Determinant (MCD) estimator or on the S estimator are advocated. Then, focusing on two of the considered MCV’s, a diagnostic tool is derived and its eﬃciency in detecting observations having an unduly large eﬀect on variability is illustrated on a real-life data set. The inﬂuence functions are also used to compute asymptotic variances under elliptical distributions, yielding approximate conﬁdence intervals. Finally, simulations are conducted in order to compare, in a ﬁnite sample setting, the performance of the classical and robust MCV’s in terms of variability and in terms of coverage probability of the corresponding asymptotic conﬁdence intervals. [less ▲] Detailed reference viewed: 140 (41 ULiège)Robustifier nos analyses de données: un must! Aerts, Stéphanie Conference given outside the academic context (2015) Financiers, biologistes, psychologues, économistes ou chimistes,… tous utilisent des outils statistiques pour prédire, expliquer, modéliser, comprendre des processus spécifiques à leur domaine. Néanmoins ... [more ▼] Financiers, biologistes, psychologues, économistes ou chimistes,… tous utilisent des outils statistiques pour prédire, expliquer, modéliser, comprendre des processus spécifiques à leur domaine. Néanmoins, les résultats de toute analyse statistique, de la plus simple à la plus complexe, peuvent être fortement influencés, voire renversés, par la présence d’une seule valeur atypique. S’il est assez aisé de surmonter ce problème pour de simples ensembles de données, lorsque plusieurs caractéristiques sont mesurées simultanément, disposer de méthodes robustes nécessite l’emploi d’algorithmes alliant à la puissance actuelle de nos ordinateurs, des théorèmes mathématiques parfois vieux de plus de 50 ans. A l’heure où on dispose de bases de données de plus en plus grandes mais de qualité inégale, il est indispensable que les praticiens se familiarisent avec les statistiques robustes afin de fiabiliser leurs conclusions. [less ▲] Detailed reference viewed: 59 (19 ULiège) |
||