Anorexia Nervosa/genetics; European Continental Ancestry Group/genetics; Gene Frequency; Genetic Markers; Genetics, Population/methods; Genome-Wide Association Study; Genotyping Techniques; Humans; Oligonucleotide Array Sequence Analysis; Phylogeography; Polymorphism, Single Nucleotide; Principal Component Analysis; Reproducibility of Results; Sample Size
Abstract :
[en] The Wellcome Trust Case Control Consortium 3 anorexia nervosa genome-wide association scan includes 2907 cases from 15 different populations of European origin genotyped on the Illumina 670K chip. We compared methods for identifying population stratification, and suggest list of markers that may help to counter this problem. It is usual to identify population structure in such studies using only common variants with minor allele frequency (MAF) >5%; we find that this may result in highly informative SNPs being discarded, and suggest that instead all SNPs with MAF >1% may be used. We established informative axes of variation identified via principal component analysis and highlight important features of the genetic structure of diverse European-descent populations, some studied for the first time at this scale. Finally, we investigated the substructure within each of these 15 populations and identified SNPs that help capture hidden stratification. This work can provide information regarding the designing and interpretation of association results in the International Consortia.
Marchini J, Cardon LR, Phillips MS, Donnelly P: The effects of human population structure on large genetic association studies. Nat Genet 2004; 36: 512-517.
Devlin B, Roeder K: Genomic control for association studies. Biometrics 1999; 55: 997-1004.
Menozzi P, Piazza A, Cavalli-Sforza L: Synthetic maps of human gene frequencies in Europeans. Science 1978; 201: 786-792.
Novembre J, Johnson T, Bryc K et al: Genes mirror geography within Europe. Nature 2008; 456: 98-103.
Lao O, van Duijn K, Kersbergen P, de Knijff P, Kayser M: Proportioning whole-genome single-nucleotide polymorphism diversity for the identification of geographic population structure and genetic ancestry. Am J Hum Genet 2006; 78: 680-690.
Nelis M, Esko T, Mägi R et al: Genetic structure of europeans: a view from the North-East. PLoS One 2009; 4: e5472.
Shriver MD, Parra EJ, Dios S et al: Skin pigmentation, biogeographical ancestry and admixture mapping. Hum Genet 2003; 112: 387-399.
Rosenberg NA, Li LM, Ward R, Pritchard JK: Informativeness of genetic markers for inference of ancestry. Am J Hum Genet 2003; 73: 1402-1422.
Kidd JR, Friedlaender FR, Speed WC, Pakstis AJ, De La Vega FM, Kidd KK: Analyses of a set of 128 ancestry informative single-nucleotide polymorphisms in a global set of 119 population samples. Invest Genet 2011; 2: 1-13.
Kim H, Hysi PG, Pawlikowska L et al: Population stratification in a case-control study of brain arteriovenous malformation in Latinos. Neuroepidemiology 2008; 31: 224-228.
Nassir R, Kosoy R, Tian C et al: An ancestry informative marker set for determining continental origin: validation and extension using human genome diversity panels. BMC Genet 2009; 10: 39.
Price AL, Patterson N, Hancks DC et al: Effects of cis and trans genetic ancestry on gene expression in African Americans. PLoS Genet 2008; 4: e1000294.
Paschou P, Drineas P, Lewis J et al: Tracing sub-structure in the European American population with PCA-informative markers. PLoS Genet 2009; 4: e1000114.
Drineas P, Lewis J, Paschou P 2010Inferring Geographic Coordinates of Origin for Europeans Using Small Panels of Ancestry Informative Markers. PLoS One 2009; 5: e11892.
Patterson N, Price A, Reich D: Population structure and eigenanalysis. PLoS Genet 2006; 2: e190.
Paradis E, Claude J, Strimmer K: APE: analyses of phylogenetics and evolution in R language. Bioinformatics 2004; 20: 289-290.
Peres-Neto PR, Jackson DA: How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test. Oecologia 2001; 129: 169-178.
Mardia KV, Kent JT, Bibby JM: Multivariate Analysis. New York, NY: Academic Press, 1979.
Oksanen J, Blanchet FG, Kindt R et al: vegan: Community Ecology Package, R package version 1.17-3, 2010. Available at: http://CRAN.R-project.org/package= vegan (last accessed August 2013).
Tian C, Plenge RM, Ransom M et al: Analysis and application of European genetic substructure using 300K SNP information. PLoS Genet 2008; 4: e4.
Purcell S: PLINK (Version 1.07) [Software], 2013. Available at: http://pngu.mgh. harvard.edu/purcell/plink/(last accessed August 2013).
Purcell S, Neale B, Todd-Brown K et al: PLINK: a toolset for whole-genome association and population-based linkage analysis. Am J Hum Genet 2007; 81: 559.
Raaum RL, Wang AB, Al-Meeri AM, Mulligan CJ: Efficient population assignment and outlier detection in human populations using biallelic markers chosen by principal component-based rankings'. BioTechniques 2010; 48: 449-454.