LD; haplotype; principal component analysis; PCA; ipPCA
Abstract :
[en] Objective
To identify and differentiate between subpopulations using a rich set of genetic markers, as using reduced sets of genetic markers for these purposes can become challenging, especially when similar geographic regions are involved or when spurious patterns are likely to exist.
Method
Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations and often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns. Notably, haplotype composition and the pattern of LD between markers may vary between larger populations but may also play a role within more confined geographic regions. Indeed, knowledge about haplotypes in unrelated individuals can reveal useful information about genetic ancestry. Here, we use iterative pruning principal component analysis (ipPCA) [1] to identify and characterize subpopulations in an unsupervised way. As input data, either pruned genome-wide SNP data are used (using PLINK 1.9 with the "indep-pairwise" option, window size = 100k, r2 < 0.25) or multilocus haplotype information derived from the genome-wide SNP panel (using BEAGLE 3.3.2 to infer haplotype). These approaches are applied to real-life data from 992 Thai individuals [2].
Result
Preliminary results indicate that ipPCA applied to pruned SNP data or ipPCA that explicitly uses multilocus information (haplotypes) give complementary information about population substructure for geographically confined populations such as the Thai samples in this study. Both methods address different aspects of population structure. Detailed simulation studies are needed to identify the optimal scenarios for haplotype-based ipPCA.
Conclusion
In this work, we propose to combine an LD-based haplotype encoding scheme with the ipPCA machinery to retrieve fine population substructures. Despite the complexities that are associated with haplotype inference, added value can be obtained when the LD structure between SNPs is exploited in the search for relevant population strata.
References
1. Intarapanich, A., et al., Iterative pruning PCA improves resolution of highly structured populations. BMC Bioinformatics, 2009. 10: p. 382.
2. Wangkumhang, P., et al., Insight into the peopling of Mainland Southeast Asia from Thai population genetic structure. PLoS One, 2013. 8(11): p. e79522.
Research Center/Unit :
Systems and Modeling Unit
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Chaichoompu, Kridsadakorn ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Fouladi, Ramouna ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Wangkumhang, Pongsakorn; National Center for Genetic Engineering and Biotechnology, Thailand > Genome Institute > Biostatistics and informatics Laboratory
Wilantho, Alisa; National Center for Genetic Engineering and Biotechnology, Thailand > Genome Institute > Biostatistics and informatics Laboratory
Chareanchim, Wanwisa; National Center for Genetic Engineering and Biotechnology, Thailand > Genome Institute > Biostatistics and informatics Laboratory
Tongsima, Sissades; National Center for Genetic Engineering and Biotechnology, Thailand > Genome Institute > Biostatistics and informatics Laboratory
Sakuntabhai, Anavaj; Institut Pasteur, France > Functional Genetics of Infectious Diseases Unit
Van Steen, Kristel ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Language :
English
Title :
LD-based haplotype encoding scheme with iterative pruning principal component analysis (ipPCA) to retrieve population substructures
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.