Regression; Data structure; Prediction; Simulation
Abstract :
[en] Monte Carlo simulation methods was used to study the effects of the data structure on the quality of the predictions in linear multiple regression. Five hundred forty (540) data files were generated of which the number of variables, R-square, the collinearity between the explanatory variables and the index of coefficient, that measures the importance of the explanatory variables in the model, were controlled. Predictions were influenced by the theoretical value of R-square, the method used to establish the model and, to a lesser extent, the collinearity between the explanatory variables. The determination of the minimal sample size which leads to predicted values better than those obtained by the mean of the dependant variable indicated that this size depends on the number of the explanatory variables, the theretical value of the R-square and the method used to establish the model. The minimal sample size increases with the models without variables selection and gradually decreases with the intensity of the selection.
Disciplines :
Physical, chemical, mathematical & earth Sciences: Multidisciplinary, general & others
Author, co-author :
Akossou, A. Y. J.
Palm, Rodolphe ; Faculté Universitaire des Sciences Agronomiques de Gembloux - FUSAGx > Sciences agronomiques > Statistique, Informatique et Mathématique appliquées
Language :
English
Title :
Validity Limit of the Linear Regression Models for the Prediction
Publication date :
March 2010
Journal title :
International Journal of Applied Mathematics and Statistics
ISSN :
0973-1377
Publisher :
Centre for Environment, Social & Economic Research (CESER), India
Akossou, A.Y.J., 2005, Impact de la structure des données sur les prédictions en régression linéaire multiple. PhD Thesis, Fac. Univ. Sci. Agron., Gembloux, Belgium, 215 p.
Akossou, A.Y.J., Palm R., 2005, Conséquences de la sélection de variables sur l'interprétation des résultats en régression linéaire multiple. Biotechnol. Agron. Soc. Environ. 9 (1), 11-18.
Baskerville, J.C., Toogood, J.H., 1986, Guided regression modelling for prediction and exploration of structure with many explanatory variables. Technometrics 24, 9-17.
Bendel, R.B., Afifi, A.A., 1977, Comparison of stopping rules in forward stepwise regression. J. Amer. Stat. Assoc. 72, 46-53.
Copas, R.D., 1983, Regression, prediction and shrinkage. J. R. Stat. Soc. B 45,311-354.
Dempster, A.P., Schatzoff, M., Wermuth N., 1977, A simulation study of alternatives to ordinary least squares. J. Amer. Stat. Assoc. 72, 77-106.
Fonton, H.N., 1995, Comparaison des méthodes de prédiction en régression linéaire multiple. Thèse de doctorat, Faculté Universitaire des Sciences Agronomiques, Gembloux, 230 p.
Hebel, P., 1992, Utilisation d'estimateurs à rétrécisseurs dans les modèles prédictifs, application à la prédiction du rendement de blé d'hiver. Thèse de doctorat, Université Paul Sabatier, Toulouse III, Biométrie, 187 p.
Hoerl, R.W., Schuenemeyerj, H., Hoerl, A.E., 1986, A simulation of biased estimation and subset selection regression technic. Technometrics 28, 369-380.
Meg, B. C., 1988, Determining the optimum number of predictors for linear prediction equation. Amer. Meteo. Soc. 116, 1623-1640.
Miller, A.J., 1990, Subset selection in regression. Monographs on statistics and applied probability 40. Chapman and Hall.
Palm, R., De Bast, A., Lahlou, M., 1991, Comparaison des modèles agrométéorologiques de type statistique empirique construits à partir de différents ensembles de variables météorologiques. Bull. Rech. Agron. Gembloux 26, 71-89.
Roecker, E.B., 1991, Prediction error and its estimation for subset-selected models. Technometrics 33, 459-468.