Abdi H, Williams LJ. Principal component analysis. Wiley Interdiscip Rev Comput Stat 2010; 2: 433-59.
Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol 1933; 24: 498-520.
Jolliffe IT. A note on the use of principal components in regression. Appl Stat 1982; 3: 300-3.
Jolliffe IT. Principal component analysis and factor analysis. Princ Compon Anal 1986; 115-28.
Park SH. Collinearity and optimal restrictions on regression parameters for estimating responses. Technometrics 1981; 23: 289-95.
Schena M, Shalon D, Davis RW, et al. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995; 270: 467-70
Tang B, Wang Y, Zhu J, et al. Web resources for model organism studies. Genom Proteom Bioinform 2015; 13: 64-8.
da Fonseca RR, Albrechtsen A, Themudo GE, et al. Next-generation biology: sequencing and data analysis approaches for non-model organisms. Mar Genomics 2016; 30: 3-13
Novembre J, Stephens M. Interpreting principal component analyses of spatial population genetic variation. Nat Genet 2008; 40: 646-9.
Price AL, Patterson NJ, Plenge RM, et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 2006; 38: 904-9.
Cavalli-Sforza LL, Menozzi P, Piazza A. Demic expansions and human evolution. Science 1993; 259: 639-46.
Lao O, Lu TT, Nothnagel M, et al. Correlation between genetic and geographic structure in Europe. Curr Biol 2008; 18: 1241-8.
Conomos MP, Reiner AP, Weir BS, et al. Model-free estimation of recent genetic relatedness. Am J Hum Genet 2016; 98: 127-48.
Niu A, Zhang S, Sha Q. A novelmethod to detect gene-gene interactions in structured populations: MDR-SP. Ann Hum Genet 2011; 75: 742-54.
Reich D, Price AL, Patterson N. Principal component analysis of genetic data. Nat Genet 2008; 40: 491-2.
Lawson DJ, Falush D. Population identification using genetic data. Annu Rev Genomics Hum Genet 2012; 13: 337-61.
StackliesW, Redestig H, ScholzM, et al. pcaMethods-a bioconductor package providing PCA methods for incomplete data. Bioinformatics 2007; 23: 1164-7.
Liu L, Zhang D, Liu H, et al. Robust methods for population stratification in genome wide association studies. BMC Bioinform 2013; 14: 132.
Maadooliat M, Huang JZ, Hu J. Integrating data transformation in principal components analysis. J Comput Graph Stat 2015; 24: 84-103.
Pearson K. LIII. On lines and planes of closest fit to systems of points in space. Lond Edinb Dublin Philos Mag J Sci 1901; 2: 559-72.
Thomas DC, Witte JS. Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol Biomark Prev Publ Am Assoc Cancer Res Cosponsored Am Soc Prev Oncol 2002; 11: 505-12.
Wacholder S, Rothman N, Caporaso N. Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol Prev Biomark 2002; 11: 513-20.
Thomas DC, Haile RW, Duggan D. Recent developments in genomewide association scans: a workshop summary and review. Am J Hum Genet 2005; 77: 337-45.
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika 1983; 70: 41-55.
Jiang Y, Zhang H. Propensity score-based nonparametric test revealing genetic variants underlying bipolar disorder. Genet Epidemiol 2011; 35: 125-32.
Zhao H, Rebbeck TR, Mitra N. A propensity score approach to correction for bias due to population stratification using genetic and non-genetic factors. Genet Epidemiol 2009; 33: 679-90.
Pritchard JK, Stephens M, Rosenberg NA, et al. Association mapping in structured populations. Am J Hum Genet 2000; 67: 170-81.
Devlin B, Roeder K. Genomic control for association studies. Biometrics 1999; 55: 997-1004.
Navas N, Romero-Pastor J, Manzano E, et al. Raman spectroscopic discrimination of pigments and tempera paint model samples by principal component analysis on firstderivative spectra. J Raman Spectrosc 2010; 41: 1486-93.
Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet 2006; 2: e190.
Paschou P, Ziv E, Burchard EG, et al. PCA-Correlated SNPs for structure identification in worldwide human populations. PLoS Genet 2007; 3: e160.
Heath SC, Gut IG, Brennan P, et al. Investigation of the fine structure of European populations with applications to disease association studies. Eur J Hum Genet EJHG 2008; 16: 1413-29.
Himes BE, Jiang X, Hu R, et al. Genome-wide association analysis in asthma subjects identifies SPATS2L as a novel bronchodilator response gene. PLoS Genet 2012; 8: e1002824.
Tantisira KG, Damask A, Szefler SJ, et al. Genome-wide association identifies the T gene as a novel asthma pharmacogenetic locus. Am J Respir Crit Care Med 2012; 185: 1286-91.
Ma J, Amos CI. Theoretical formulation of principal components analysis to detect and correct for population stratification. PLoS One 2010; 5: e12510.
Engelhardt BE, Stephens M. Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis. PLoS Genet 2010; 6: e1001117, https://doi. org/10. 1371/journal. pgen. 1001117.
Popescu A-A, Harper AL, Trick M, et al. A novel and fast approach for population structure inference using kernel-PCA and optimization. Genetics 2014; 198: 1421-31.
Huang H, Fang M, Jostins L, et al. Fine-mapping inflammatory bowel disease loci to single-variant resolution. Nature 2017; 547: 173-8.
Chen H, Wang C, Conomos MP, et al. Control for population structure and relatedness for binary traits in genetic association studies via logistic mixed models. Am J Hum Genet 2016; 98: 653-66.
Kang HM, Zaitlen NA, Wade CM, et al. Efficient control of population structure in model organism association mapping. Genetics 2008; 178: 1709-23.
Thomas SC, Hill WG. Estimating quantitative genetic parameters using sibships reconstructed from marker data. Genetics 2000; 155: 1961-72.
Wang J. An estimator for pairwise relatedness using molecular markers. Genetics 2002; 160: 1203-15.
Nievergelt CM, Libiger O, Schork NJ. Generalized analysis of molecular variance. PLoS Genet 2007; 3: e51.
Zhao K, Aranzana MJ, Kim S, et al. An arabidopsis example of association mapping in structured samples. PLoS Genet 2007; 3: e4.
Kang HM, Sul JH, Service SK, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet 2010; 42: 348-54.
Zhou X, Stephens M. Genome-wide efficient mixed model analysis for association studies. Nat Genet 2012; 44: 821-4.
Lippert C, Listgarten J, Liu Y, et al. FaST linear mixed models for genome-wide association studies. Nat Methods 2011; 8: 833-5.
Aulchenko YS, Ripke S, Isaacs A, et al. GenABEL: an R library for genome-wide association analysis. Bioinformatics 2007; 23: 1294-6.
Hoffman GE. Correcting for population structure and kinship using the linear mixed model: theory and extensions. PLoS One 2013; 8: e75707.
Listgarten J, Lippert C, Kang EY, et al. A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics 2013; 29: 1526-33.
Yang J, Zaitlen NA, Goddard ME, et al. Advantages and pitfalls in the application of mixed model association methods. Nat Genet 2014; 46: 100-6.
Tucker G, Price AL, Berger B. Improving the power of GWAS and avoiding confounding from population stratification with PC-Select. Genetics 2014; 197: 1045-9.
Kotsiantis S, Pintelas P. Recent advances in clustering: a brief survey. WSEAS Trans Inf Sci Appl 2004; 1: 73-81.
Lee S, Huang JZ, Hu J. Sparse logistic principal components analysis for binary data. Ann Appl Stat 2010; 4: 1579-601.
Solovieff N, Hartley SW, Baldwin CT, et al. Clustering by genetic ancestry using genome-wide SNP data. BMC Genet 2010; 11: 108.
Ben-Hur A, Guyon I. Detecting stable clusters using principal component analysis. Methods Mol Biol Clifton NJ 2003; 224: 159-82.
Maus B, Jung C, John JMM, et al. Molecular reclassification of Crohn's disease: a cautionary note on population stratification. PLoS One 2013; 8: e77720.
Rencher AC, Christensen WF. Methods of Multivariate Analysis, 2012.
Everitt B, Hothorn T. An Introduction to Applied Multivariate Analysis with R, 2011.
Jolliffe IT. Discarding variables in a principal component analysis. I: artificial data. J R Stat Soc Ser C Appl Stat 1972; 21: 160-73.
Abraham G, Inouye M. Fast principal component analysis of large-scale genome-wide data. PLoS One 2014; 9: e93766.
Alanis-Lobato G, Cannistraci CV, Eriksson A, et al. Highlighting nonlinear patterns in population genetics datasets. Sci Rep 2015; 5: 8140.
Price AL, Zaitlen NA, Reich D, et al. New approaches to population stratification in genome-wide association studies. Nat Rev Genet 2010; 11: 459-63.
Galinsky KJ, Bhatia G, Loh P-R, et al. Fast principalcomponent analysis reveals convergent evolution of ADH1B in Europe and East Asia. Am J Hum Genet 2016; 98: 456-72.
Bush WS, Moore JH. Chapter 11: genome-wide association studies. PLoS Comput Biol 2012; 8: e1002822.
Zou F, Lee S, Knowles MR, et al. Quantification of population structure using correlated SNPs by shrinkage principal components. Hum Hered 2010; 70: 9-22.
Gusev A, Bhatia G, Zaitlen N, et al. Quantifying missing heritability at known GWAS loci. PLoS Genet 2013; 9: e1003993.
Clayton D. snpStats: SnpMatrix and XSnpMatrix Classes and Methods, 2015.
Clayton D, Leung H-T. An R package for analysis of wholegenome association studies. Hum Hered 2007; 64: 45-51.
Jostins L, Ripke S, Weersma RK, et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 2012; 491: 119-24.
Balding DJ, Nichols RA. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 1995; 96: 3-12.
Wu C, DeWan A, Hoh J, et al. A comparison of association methods correcting for population stratification in case-control studies. Ann Hum Genet 2011; 75: 418-27.
Wright S. The genetical structure of populations. Ann Eugen 1951; 15: 323-54.
Cattell RB. The Scree Test for the number of factors. Multivar Behav Res 1966; 1: 245-76.
Kaiser HF. The application of electonic computers to factor analysis. Educ Psychol Meas 1960; 20: 141-51.
Velicer WF. Determining the number of components from the matrix of partial correlations. Psychometrika 1976; 41: 321-7.
Dray S. On the number of principal components: a test of dimensionality based on measurements of similarity between matrices. Comput Stat Data Anal 2008; 52: 2228-37.
Hansen LK, Larsen J, Nielsen FA, et al. Generalizable patterns in neuroimaging: how many principal components? NeuroImage 1999; 9: 534-44.
Li Q, Wacholder S, Hunter DJ, et al. Genetic background comparison using distance-based regression, with applications in population stratification evaluation and adjustment. Genet Epidemiol 2009; 33: 432-41.
Lee D, Lee W, Lee Y, et al. Super-sparse principal component analyses for high-throughput genomic data. BMC Bioinform 2010; 11: 296, https://doi. org/10. 1186/1471-2105-11-296.
Peloso GM, Lunetta KL. Choice of population structure informative principal components for adjustment in a case-control study. BMC Genet 2011; 12: 64.
Yu K, Wang Z, Li Q, et al. Population substructure and control selection in genome-wide association studies. PLoS One 2008; 3: S108, https://doi. org/10. 1186/1753-6561-3-S7-S108.
Peloso GM, Timofeev N, Lunetta KL. Principal-componentbased population structure adjustment in the North American Rheumatoid Arthritis Consortium data: impact of single-nucleotide polymorphism set and analysis method. BMC Proc 2009; 3: S108.
Li Q, Yu K. Improved correction for population stratification in genome-wide association studies by identifying hidden population structures. Genet Epidemiol 2008; 32: 215-26.
Zou H, Hastie T, Tibshirani R. Sparse principal component analysis. J Comput Graph Stat 2006; 15: 265-86.
Jolliffe IT. Rotation of principal components: choice of normalization constraints. J Appl Stat 1995; 22: 29-35.
McVean G. A genealogical interpretation of principal components analysis. PLoS Genet 2009; 5: e1000686.
Li G, Chen Z. Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. J Am Stat Assoc 1985; 80: 759-66.
Croux C, Ruiz-Gazen A. High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 2005; 95: 206-26.
Croux C, Filzmoser P, Oliveira MR. Algorithms for Projection-Pursuit robust principal component analysis. Chemom Intell Lab Syst 2007; 87: 218-25.
Shen H, Huang JZ. Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 2008; 99: 1015-34.
Lee AB, Luca D, Roeder K. A spectral graph approach to discovering genetic ancestry. Ann Appl Stat 2010; 4: 179-202.
Belkin M, Niyogi P. Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput 2003; 15: 1373-96.
Nelson MR, Bryc K, King KS, et al. The population reference sample, POPRES: a resource for population, disease, and pharmacological genetics research. Am J Hum Genet 2008; 83: 347-58.
Linting M, Meulman JJ, Groenen PJF, et al. Nonlinear principal components analysis: introduction and application. Psychol Methods 2007; 12: 336-58.
Landgraf AJ, Lee Y. Dimensionality reduction for binary data through the projection of natural parameters. ArXiv151006112 Stat 2015.
Collins M, Dasgupta S, Schapire RE. A generalization of principal component analysis to the exponential family. Proc 14th Int Conf Neural Inf Process Syst Nat Synth 2001: 617-24.
de Leeuw J. Principal component analysis of binary data by iterated singular value decomposition. Comput Stat Data Anal 2006; 50: 21-39.
Schein AI, Saul LK, Ungar LH. A generalized linear model for principal component analysis of binary data. Proc 9th Int Workshop Artif Intell Stat 2003: 546431.
Lu M, Huang JZ, Qian X. Sparse exponential family principal component analysis. Pattern Recognit 2016; 60: 681-91.
Song Y, Westerhuis JA, Aben N, et al. Principal component analysis of binary genomics data. Brief Bioinform 2017, bxx119, https://doi. org/10. 1093/bib/bbx119.
Konishi S. Introduction to Multivariate Analysis: Linear and Nonlinear Modeling, 2014.
Theodoridis S, Koutroumbas K. Pattern Recognition, Fourth Edition, 2008.
Tipping ME, Bishop CM. Probabilistic principal component analysis. J R Stat Soc Ser B Stat Methodol 1999; 61: 611-22.
Nounou MN, Bakshi BR, Goel PK, et al. Bayesian principal component analysis. J Chemom2002; 16: 576-95.
Mohamed S, Ghahramani Z, Heller KA. Bayesian Exponential Family PCA. Adv Neural Inf Process Syst 2009; 21: 1089-96.
Liaw A, Wiener M. Classification and regression by RandomForest. R News 2002; 2: 18-22.
Rutkoski JE, Poland J, Jannink J-L, et al. Imputation of unordered markers and the impact on genomic selection accuracy. G3 GenesGenomesGenetics 2013; 3: 427-39.
Wright MN, Ziegler A. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. ArXiv Prepr. In: ArXiv150804409, 2015.
Fu Y-B. Genetic diversity analysis of highly incomplete SNP genotype data with imputations: an empirical assessment. G3 GenesGenomesGenetics 2014; 4: 891-900.
Wang C, Zhan X, Liang L, et al. Improved ancestry estimation for both genotyping and sequencing data using Projection Procrustes Analysis and Genotype Imputation. Am J Hum Genet 2015; 96: 926-37.
Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet 2012; 44: 243-6.
Wang C, Zhan X, Bragg-Gresham J, et al. Ancestry estimation and control of population stratification for sequence-based association studies. Nat Genet 2014; 46: 409-15.
Fumagalli M, Vieira FG, Korneliussen TS, et al. Quantifying population genetic differentiation from next-generation sequencing data. Genetics 2013; 195: 979-92.
Conomos MP, Miller M, Thornton T. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet Epidemiol 2015; 39: 276-93.
Thornton T, McPeek MS. Case-control association testing with related individuals: a more powerful quasi-likelihood score test. Am J Hum Genet 2007; 81: 321-37.
Choi Y, Wijsman EM, Weir BS. Case-control association testing in the presence of unknown relationships. Genet Epidemiol 2009; 33: 668-78.
Thornton T, McPeek MS. ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. Am J Hum Genet 2010; 86: 172-84.
Li M, Reilly MP, Rader DJ, et al. Correcting population stratification in genetic association studies using a phylogenetic approach. Bioinformatics 2010; 26: 798-806.
Zhu X, Li S, Cooper RS, et al. A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet 2008; 82: 352-65.
Ziegler A, König IR, Pahlke F. A Statistical Approach to Genetic Epidemiology: Concepts and Applications, with an e-Learning Platform, 2010.
Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient metaanalysis of genomewide association scans. Bioinform Oxf Engl 2010; 26: 2190-1.
Mägi R, Morris AP. GWAMA: software for genomewide associationmeta-analysis. BMC Bioinform 2010; 11: 288, http://www. biomedcentral. com/1471-2105/11/288.
Qayyum R, Snively BM, Ziv E, et al. A meta-analysis and genome-wide association study of platelet count and mean platelet volume in african americans. PLoS Genet 2012; 8: e1002491.
Wang M, Jiang N, Jia T, et al. Genome-wide association mapping of agronomic and morphologic traits in highly structured populations of barley cultivars. Theor Appl Genet 2012; 124: 233-46.
Bailey-Wilson JE, Brennan JS, Bull SB, et al. Regression and data mining methods for analyses ofmultiple rare variants in the Genetic Analysis Workshop 17 Mini-Exome Data. Genet Epidemiol 2011; 35: S92-100.
Keen-Kim D, Mathews CA, Reus VI, et al. Over representation of rare variants in a specific ethnic group may confuse interpretation of association analyses. Hum Mol Genet 2006; 15: 3324-8.
Setakis E, Stirnadel H, Balding DJ. Logistic regression protects against population structure in genetic association studies. Genome Res 2006; 16: 290-6.
Astle W, Balding DJ. Population structure and cryptic relatedness in genetic association studies. Stat Sci 2009; 24: 451-71.
Bouaziz M, Ambroise C, Guedj M. Accounting for population stratification in practice: a comparison of the main strategies dedicated to genome-wide association studies. PLoS One 2011; 6: e28845.
Sillanpää MJ. Overview of techniques to account for confounding due to population stratification and cryptic relatedness in genomic data association analyses. Heredity 2011; 106: 511-9.
Wawro N, Bammann K, Pigeot I. Testing for association in the presence of population stratification: a simulation study comparing the S-TDT, STRAT and the GC. Biom J Biom Z 2006; 48: 420-34.
Kraft P. Population stratification bias: more widespread than previously thought. Epidemiol Camb Mass 2011; 22: 408-9.
Bhattacharjee S, Wang Z, Ciampa J, et al. Using principal components of genetic variation for robust and powerful detection of gene-gene interactions in case-control and case-only studies. Am J Hum Genet 2010; 86: 331-42.
Zhao Y, Chen F, Zhai R, et al. Correction for population stratification in random forest analysis. Int J Epidemiol 2012; 41: 1798-806.
Van Steen K. Travelling the world of gene-gene interactions. Brief Bioinform 2012; 13: 1-19.
Calle ML, Urrea Gales V, Malats i Riera N, et al. MB-MDR: Model-Based Multifactor Dimensionality Reduction for Detecting Interactions in High-Dimensional Genomic Data Tech. Rep. Spain: Department of Systems Biology, Universitat de Vic, Vic, (2008), 24.
Cattaert T, Calle ML, Dudek SM, et al. Model-based multifactor dimensionality reduction for detecting epistasis in case-control data in the presence of noise. Ann Hum Genet 2011; 75: 78-89.
Gola D, John M, M J, et al. A roadmap to multifactor dimensionality reduction methods. Brief Bioinform 2016; 17: 293-308.