[en] We present an extension of the logistic regression procedure to identify dichotomous differential item functioning (DIF) in the presence of more than two groups of respondents. Starting from the usual framework of a single focal group, we propose a general approach to estimate the item response functions in each group and to test for the presence of uniform DIF, non uniform DIF, or both. This generalized procedure is compared to other existing DIF methods for multiple groups with a real data set on language skill assessment. Emphasis is put on the flexibility, completeness and computational easiness of the generalized method.
Disciplines :
Mathematics
Author, co-author :
Magis, David ; Université de Liège - ULiège > Département de mathématique > Statistique mathématique
Raîche, Gilles
Béland, Sébastien
Gérard, Paul
Language :
English
Title :
A generalized logistic regression procedure to detect differential item functioning among multiple groups
Abdi, H. 2007. "Bonferroni and Šidák corrections for multiple comparisons". In Encyclopedia of measurement and statistics, Edited by: Salkind, N. J. Thousand Oaks, CA: Sage.
Ackerman, T. A. 1992. A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29: 67-91.
Agresti, A. 1990. Categorical data analysis, New York: Wiley.
Agresti, A. 1996. An introduction to categorical data analysis, New York: Wiley.
Agresti, A. 2002. Categorical data analysis (2nd ed.), New York: Wiley.
Angoff, W. H. and Sharon, A. T. 1974. The evaluation of differences in test performance of two or more groups. Educational and Psychological Measurement, 34: 807-816.
Bock, R. D. 1975. Multivariate statistical methods, New York: McGraw-Hill.
Bolt, D. M. and Cohen, A. S. 2005. A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42: 133-148.
Candell, G. L. and Drasgow, F. 1988. An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12: 253-260.
Clauser, B. E. and Mazor, K. M. 1998. Using statistical procedures to identify differential item functioning test items. Educational Measurement: Issues and Practice, 17: 31-44.
Cox, D. R. and Hinkley, D. V. 1974. Theoretical statistics, London: Chapman and Hall.
Ellis, B. B. and Kimmel, H. D. 1992. Identification of unique cultural response patterns by means of item response theory. Journal of Applied Psychology, 77: 177-184.
Fidalgo, A. M. and Madeira, J. M. 2008. Generalized Mantel-Haenszel methods for differential item functioning detection. Educational and Psychological Measurement, 68: 940-958.
Fidalgo, A. M. and Scalon, J. D. 2010. Using generalized Mantel-Haenszel statistics to assess DIF among multiple groups. Journal of Psychoeducational Assessment, 28: 60-69.
Frederickx, S., Tuerlinckx, F., De Boeck, P. and Magis, D. 2010. RIM: A random item mixture model to detect differential item functioning. Journal of Educational Measurement, 47: 432-457.
Hanson, B. A. 1998. Uniform DIF and DIF defined by differences in item response functions. Journal of Educational and Behavioral Statistics, 23: 244-253.
Hidalgo, M. D. and Lopez-Pina, J. A. 2004. Differential item functioning detection and effect size: a comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64: 903-915.
Hoijtink, H. 1998. Constrained latent class analysis using the Gibbs sampler and posterior predictive p-values: Applications to educational testing. Statistica Sinica, 8: 691-712.
Hoijtink, H., Klugkist, I. and Boelen, P. A. 2008. Bayesian information of informative hypotheses, New York: Springer.
Holland, P. W. and Thayer, D. T. 1988. "Differential item performance and the Mantel-Haenszel procedure". In Test validity, Edited by: Wainer, H. and Braun, H. I. 129-145. Hillsdale, NJ: Lawrence Erlbaum Associates.
Holm, S. 1979. A simple sequentially rejective multiple testing procedure. Scandinavian Journal of Statistics, 6: 65-70.
Jodoin, M. G. and Gierl, M. J. 2001. Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14: 329-349.
Johnson, R. A. and Wichern, D. W. 1998. Applied multivariate statistical analysis (4th ed.), Upper Saddle River, NJ: Prentice-Hall.
Kanjee, A. 2007. Using logistic regression to detect bias when multiple groups are tested. South African Journal of Psychology, 37: 47-61.
Kim, S.-H., Cohen, A. S. and Park, T.-H. 1995. Detection of differential item functioning in multiple groups. Journal of Educational Measurement, 32: 261-276.
Laurier, M. D., Froio, L., Paero, C. and Fournier, M. 1998. L'élaboration d'un test provincial pour le classement des étudiants en anglais langue seconde au collégial [The elaboration of a provincial test to classify students in English, as a second language, in colleges], Québec, QC: Direction générale de l'enseignement collégial, ministère de l'Education du Québec.
Lord, F. M. 1980. Applications of item response theory to practical testing problems, Hillsdale, NJ: Lawrence Erlbaum Associates.
Magis, D., Béland, S., Tuerlinckx, F. and De Boeck, P. 2010. A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42: 847-862.
McCullagh, P. and Nelder, J. 1989. Generalized linear models (2nd ed.), London: Chapman & Hall.
Millsap, R. E. and Everson, H. T. 1993. Methodology review: statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17: 297-334.
Nagelkerke, N. J. D. 1991. A note on a general definition of the coefficient of determination. Biometrika, 78: 691-692.
Nelder, J. and Wedderburn, R. W. M. 1972. Generalized linear models. Journal of the Royal Statistical Society (Series A), 135: 370-384.
Osterlind, S. J. and Everson, H. T. 2009. Differential item functioning (2nd ed.), Thousand Oakes, CA: Sage.
Penfield, R. D. 2001. Assessing differential item functioning among multiple groups: a comparison of three Mantel-Haenszel procedures. Applied Measurement in Education, 14: 235-259.
Penfield, R. D. and Camilli, G. 2007. "Differential item functioning and item bias". In Handbook of statistics 26: psychometrics, Edited by: Rao, C. R. and Sinharray, S. 125-167. Amsterdam, The Netherlands: Elsevier.
Penfield, R. D. and Lam, T. C. M. 2001. Assessing differential item functioning in performance assessment: Review and recommendations. Educational Measurement: Issues and Practice, 19: 5-15.
Raîche, G. 2002. Le dépistage du sous-classement aux tests de classement en anglais, langue seconde, au collégial [The detection of under-classification at English, as a second language, test in college], Gatineau, QC: Collège de l'Outaouais.
Raju, N. S. 1990. Determining the significance of estimated signed and unsigned areas between two item response functions. Applied Psychological Measurement, 14: 197-207.
Rao, C. R. 1973. Linear statistical inference and its applications (second edition), New York: Wiley.
Rogers, H. J. and Swaminathan, H. 1993. A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17: 105-116.
Schmitt, A. P. and Dorans, N. J. 1990. Differential item functioning for minority examinees on the SAT. Journal of Educational Measurement, 27: 67-81.
Shaffer, J. P. 1995. Multiple hypothesis testing. Annual Review of Psychology, 46: 561-584.
Shealy, R. T. and Stout, W. 1993. A model based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DIF as well as item bias/DIF. Psychometrika, 58: 159-194.
Šidák, Z. 1967. Rectangular confidence region for the means of multivariate normal distributions. Journal of the American Statistical Association, 62: 626-633.
Swaminathan, H. and Rogers, H. J. 1990. Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27: 361-370.
Thissen, D., Steinberg, L. and Wainer, H. 1988. "Use of item response theory in the study of group difference in trace lines". In Test validity, Edited by: Wainer, H. and Braun, H. 147-170. Hillsdale, NJ: Lawrence Erlbaum Associates.
Van den Noortgate, W. and De Boeck, P. 2005. Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30: 443-464.
Wald, A. 1939. Contributions to the theory of statistical estimation and testing hypotheses. Annals of Mathematical Statistics, 10: 299-326.
Wedderburn, R. W. M. 1976. On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models. Biometrika, 63: 27-32.
Wilks, S. S. 1938. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Annals of Mathematical Statistics, 9: 60-62.
Zumbo, B. D. and Thomas, D. R. 1997. "A measure of effect size for a model-based approach for studying DIF". Prince George, BC, Canada: University of Northern British Columbia, Edgeworth Laboratory for Quantitative Behavioral Science.
Zwick, R. and Ercikan, K. 1989. Analysis of differential item functioning in the NAEP history assessment. Journal of Educational Measurement, 26: 55-66.