Logistic discrimination using robust estimators: an influence function approach

Croux, Christophe; Haesbroeck, Gentiane; Joossens, Kristel

doi:10.1002/cjs.5550360114

Request a copy

Article (Scientific journals)

Logistic discrimination using robust estimators: an influence function approach

Croux, Christophe; Haesbroeck, Gentiane; Joossens, Kristel

2008 • In Canadian Journal of Statistics, 36 (1), p. 157-174

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/28431

DOI
10.1002/cjs.5550360114

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

CJS166DEC05.pdf

Author postprint (224.3 kB)

Request a copy

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

classification; diagnostic; discrimination; efficiency; error rate; influence function; logistic regression; robustness

Abstract :

[en] Logistic regression is frequently used for classifying observations into two groups. Unfortunately there are often outlying observations in a data set and these might affect the estimated model and the associated classification error rate. In this paper, the authors study the effect of observations in the training sample on the error rate by deriving influence functions. They obtain a general expression for the influence function of the error rate, and they compute it for the maximum likelihood estimator as well as for several robust logistic discrimination procedures. Besides being of interest in their own right, the influence functions are also used to derive asymptotic, classification efficiencies of different logistic discrimination rules. The authors also show how influential points can be detected by means of a diagnostic plot based on the values of the influence function.

Disciplines :

Mathematics

Author, co-author :

Croux, Christophe

Haesbroeck, Gentiane ; Université de Liège - ULiège > Département de mathématique > Statistique (aspects théoriques)

Joossens, Kristel

Language :

English

Title :

Logistic discrimination using robust estimators: an influence function approach

Publication date :

2008

Journal title :

Canadian Journal of Statistics

ISSN :

0319-5724

Publisher :

Wiley-Blackwell, United States

Volume :

Issue :

Pages :

157-174

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBi :

since 18 November 2009

Statistics

Number of views

73 (3 by ULiège)

Number of downloads

0 (0 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

A. M. Bianco & V. J. Yohai (1996). Robust estimation in the logistic regression model. In Robust Statistics, Data Analysis and Computer Intensive Methods (H. Rieder, ed.), Springer, New York, pp. 17-34.
G. Boente, A. M. Pires & I. M. Rodrigues (2002). Influence functions and outlier detection under the common principal components model: a robust approach. Biometrika, 89, 861-875.
H. D. Bondell (2005). Minimum distance estimation for the logistic regression model. Biometrika, 92, 724-731.
R. J. Carroll & S. Pederson (1993). On robust estimation in the logistic regression model. Journal of the Royal Statistical Society Series B, 55, 693-706.
A. Christmann (1996). High breakdown point estimators in logistic regression. In Robust Statistics, Data Analysis and Computer Intensive Methods (H. Rieder, ed.), Springer Verlag, New York, pp. 79-89.
A. Christmann & P. Rousseeuw (2001). Measuring overlap in binary regression. Computational Statistics and Data Analysis, 37, 65-75.
R. D. Cook & S. Weisberg (1982). Residuals and Influence in Regression. Chapman & Hall, London.
J. B. Copas (1988). Binary regression models for contaminated data. Journal of the Royal Statistical Society Series B, 50, 225-265.
F. Critchley & C. Vitiello (1991). The influence of observations on misclassification probability estimates in linear discriminant analysis. Biometrika, 78, 677-690.
C. Croux & C. Dehon (2001). Robust linear discriminant analysis using Sestimators. The Canadian Journal of Statistics, 29, 473-492.
C. Croux, P. Filzmoser & K. Joossens (2008). Classification efficiencies for robust linear discriminant analysis. Statistica Sinica, 18 (2), in press; article #SS-05-270.
C. Croux, C. Randre & G. Haesbroeck (2002). The breakdown behaviour of the maximum likelihood estimator in the logistic regression model. Statistics & Probability Letters, 60, 377-386.
C. Croux & G. Haesbroeck (2003). Implementing the Bianco and Yohai estimator for logistic regression. Computational Statistics and Data Analysis, 44, 273-295.
P. L. Davies (1987). Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. The Annals of Statistics, 15, 1269-1292.
B. Efron (1975). The efficiency of logistic regression compared to normal discriminant analysis. Journal of the American Statistical Association, 70, 892-898.
J. Friedman, T. Hastie & R. Tibshirani (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw & W. A. Stahel (1986). Robust Statistics: The Approach Based on Influence Functions. Wiley, New York.
X. He & W. K. Fung (2000). High breakdown estimation for multiple populations with applications to discriminant analysis. Journal of Multivariate Analysis, 72, 151-162.
R. A. Johnson & D. W. Wichern (1998). Applied Multivariate Statistical Analysis, 4th Edition. Prentice Hall, Upper Saddle River, New Jersey.
W. Johnson (1985). Influence measures for logistic regression: another point of view. Biometrika, 72, 59-65.
H. R. Künsch, L. A. Stefanski & R. J. Carroll (1989). Conditionally unbiased bounded influence estimation in general regression models, with applications to generalized linear models. Journal of the American Statistical Association, 84, 460-466.
G. Pison, P. J. Rousseeuw, P. Filzmoser & C. Croux (2003). Robust factor analysis. Journal of Multivariate Analysis, 84, 145-172.
G. Pison & S. Van Aelst (2004). Diagnostic plots for robust multivariate methods. Journal of Computational and Graphical Statistics, 13, 310-329.
D. Pregibon (1981). Logistic regression diagnostics. Annals of Statistics, 9, 705-724.
D. Pregibon (1982). Resistant fits for some commonly used logistic models with medical applications. Biometrics, 38, 485-498.
P. J. Rousseeuw & A. Christmann (2003). Robustness against separation and outliers in logistic regression. Computational Statistics and Data Analysis, 43, 315-332.
P. J. Rousseeuw & A. M. Leroy (1987), Robust Regression and Outlier Detection. Wiley, New York.
L. A. Stefanski, R. J. Carroll & D. Ruppert (1986). Optimally bounded score functions for generalizes linear models with applications to logistic regression. Biometrika, 73, 413-424.
M.-P. Victoria-Feser (2002). Robust inference with binary data. Psychometrica, 67, 21-32.