Poster (Scientific congresses and symposiums)
A novel unsupervised clustering approach with multiple data types to reveal fine-level structure
Chaichoompu, Kridsadakorn; Tongsima, Sissades; Shaw, Philip James et al.
2016the European Human Genetics Conference 2016 (ESHG 2016), Barcelona
 

Files


Full Text
poster_eshg_11052016.pdf
Author preprint (5.78 MB)
Request a copy

All documents in ORBi are protected by a user license.

Send to



Details



Abstract :
[en] Introduction: Several methods exist to identify population substructure that is due to shared genetic ancestry or regional proximity. These may be SNP-based or haplotype-based (Price et al. 2006, Lawson et al. 2012). Here, we present a flexible unsupervised clustering approach that is built on the ipPCA machinery (Intarapanich et al. 2009). Methods: Our method supports both numeric and categorical data, and can be applied to panels of SNPs and/or haplotypes, or gene-based integrative summaries (Fouladi et al. 2015). Unlike ipPCA, our method involves an iterative process using binary and ternary splits based on multivariate Gaussian mixture modeling of PCs and Clustering EM (CEM) estimation as in (Lebret et al. 2015). To assess performance, we considered different simulated scenarios of FST=[0.0005,0.006], 5,000-20,000 independent SNPs in HWE, 500-8,000 individuals, and 2-4 populations (Balding and Nichols 1995), with 100 replicates for each scenario. SNPs were treated as categorical or continuous (including ancestry-corrected SNPs). Haplotype-based runs used HapMap 3 data: CHB-JPT (FST=0.007) and CEU-TSI (FST=0.004). Result and Conclusion: In simulated scenarios of extremely subtle structure (FST=[0.0009,0.002]), a population classification accuracy of 92.56% or greater was obtained, which was superior to ipPCA. Promising results to detect fine structure were also obtained in case of the HapMap populations. We believe that the ability of our approach to detect subtle structure, including outlier individuals, will be important in molecular reclassification studies of patients from whom underlying population patterns have been removed. Grants: KC and KVS acknowledge FNRS, AS acknowledges ANR, ST acknowledges NSTDA, and PJS acknowledges TRF.
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Chaichoompu, Kridsadakorn ;  Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Tongsima, Sissades;  National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand > Genome Technology Research Unit > Biostatistics and Bioinformatics Laboratory
Shaw, Philip James;  National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand > Medical Molecular Biology Research Unit > Protein-Ligand Engineering and Molecular Biology Laboratory
Sakuntabhai, Anavaj;  Institut Pasteur, France > Functional Genetics of Infectious Diseases Unit
Van Steen, Kristel  ;  Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Language :
English
Title :
A novel unsupervised clustering approach with multiple data types to reveal fine-level structure
Publication date :
21 May 2016
Event name :
the European Human Genetics Conference 2016 (ESHG 2016), Barcelona
Event place :
Barcelona, Spain
Event date :
21-24 May 2016
Audience :
International
Name of the research project :
Foresting in Integromics Inference
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Available on ORBi :
since 20 July 2016

Statistics


Number of views
86 (10 by ULiège)
Number of downloads
2 (1 by ULiège)

Bibliography


Similar publications



Contact ORBi