Poster (Scientific congresses and symposiums)
Capturing fine-level structure using unsupervised clustering method with multiple data types
Chaichoompu, Kridsadakorn; Tongsima, Sissades; Shaw, Philip James et al.
2016byteMAL (Bioinformatics for Young inTernational researchers) conference in Aachen
 

Files


Full Text
Brief_5min_byteMal_15062016.pdf
Author preprint (4.44 MB)
Request a copy

All documents in ORBi are protected by a user license.

Send to



Details



Abstract :
[en] Several methods exist to detect shared genetic ancestry or to identify population substructure using SNP-based or haplotype-based information (Price et al. 2006, Lawson et al. 2012). Here, we propose an unsupervised clustering method built on the ipPCA algorithm (Intarapanich et al. 2009). Our method supports both ordinal and categorical data, and it can be applied to panels of single locus and/or multiple loci data, or gene-based integrative summaries (Fouladi et al. 2015). Our method involves an iterative process using binary and ternary splits based on multivariate Gaussian mixture modeling of PCs and Clustering EM estimation as in (Lebret et al. 2015). To evaluate its performance, we examined different simulated scenarios of 2-4 populations, 500-8,000 individuals, 5,000-20,000 independent SNPs in HWE, and FST=[0.0007,0.006] (Balding and Nichols 1995), with 100 replicates for each scenario. SNPs were treated as categorical or continuous including ancestry-corrected SNPs. Haplotype-based runs used HapMap 3 data: CHB, CHD, and JPT. In simulated scenarios of extremely subtle structure (FST=[0.0009,0.006]), a population classification accuracy of 92% or greater was obtained, which was superior to ipPCA. Also in case of the HapMap populations, promising results to detect fine structure were obtained. We are convinced that our method has a potential to detect fine-level structure and it will be important in molecular reclassification studies of patients once underlying population structure has been removed.
Disciplines :
Life sciences: Multidisciplinary, general & others
Author, co-author :
Chaichoompu, Kridsadakorn ;  Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Tongsima, Sissades;  National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand > Genome Technology Research Unit > Biostatistics and Bioinformatics Laboratory
Shaw, Philip James;  National Center for Genetic Engineering and Biotechnology (BIOTEC), Thailand > Medical Molecular Biology Research Unit > Protein-Ligand Engineering and Molecular Biology Laboratory
Sakuntabhai, Anavaj;  Institut Pasteur, France > Functional Genetics of Infectious Diseases Unit
Van Steen, Kristel  ;  Université de Liège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Bioinformatique
Language :
English
Title :
Capturing fine-level structure using unsupervised clustering method with multiple data types
Publication date :
17 June 2016
Event name :
byteMAL (Bioinformatics for Young inTernational researchers) conference in Aachen
Event place :
Aachen, Germany
Event date :
17 June 2016.
Name of the research project :
Foresting in Integromics Inference
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique [BE]
Available on ORBi :
since 20 July 2016

Statistics


Number of views
85 (7 by ULiège)
Number of downloads
2 (2 by ULiège)

Bibliography


Similar publications



Contact ORBi