[en] Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations. Often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns or small subsets of SNPs are selected (e.g. PCA-correlated SNPs) to reproduce the genomic structure of the complete data set. Identifying and differentiating between subpopulations using such a reduced set can become challenging, especially when similar geographic regions are involved or when spurious patterns are likely to exist.
Although PCA-based methods can resolve structure, they cannot infer ancestry. On the other hand, the structure of haplotypes in unrelated individuals can reveal useful information about genetic ancestry. Notably, haplotype composition and the pattern of LD between markers may vary between larger populations but may also play a role within more confined geographic regions. In addition, iterative pruning principal component analysis (ipPCA) has been shown to be a powerful tool to cluster subpopulations based on SNP profiles.
Despite the complexities that are associated with haplotype inference, we argue that added value can be obtained when the LD structure between SNPs is exploited in the search for relevant population strata. In this work, we propose to combine an LD-based novel haplotype encoding scheme with the ipPCA machinery to retrieve fine population substructures. The approach is compared to state-of-the-art methods in the context of population substructure and admixture analysis.
Research Center/Unit :
Systems and Modeling Unit, Montefiore Institute and Bioinformatics and Modeling, GIGA-R
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Chaichoompu, Kridsadakorn ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Fouladi, Ramouna; University of Liege > Montefiore Institute > Systems and Modeling Unit
Wangkumhang, Pongsakorn; National Center for Genetic Engineering and Biotechnology > Genome Institute > Biostatistics and informatics Laboratory
Wilantho, Alisa; National Center for Genetic Engineering and Biotechnology > Genome Institute > Biostatistics and informatics Laboratory
Chareanchim, Wanwisa; National Center for Genetic Engineering and Biotechnology > Genome Institute > Systems and Modeling Unit
Tongsima, Sissades; National Center for Genetic Engineering and Biotechnology > Genome Institute > Biostatistics and informatics Laboratory
Sakuntabhai, Anavaj; Institut Pasteur > Functional Genetics of Infectious Diseases Unit
Van Steen, Kristel; University of Liege > Montefiore Institute > Systems and Modeling Unit
Language :
English
Title :
Haplotype information combined with iterative pruning PCA (ipPCA) to improve population clustering
Publication date :
01 April 2014
Event name :
The 42nd European Mathematical Genetics Meeting 2014
Event organizer :
Statistical Genetics and Bioinformatics Group, Cologne Center for Genomics (CCG), University of Cologne