References of "Chaichoompu, Kridsadakorn"
     in
Bookmark and Share    
Full Text
See detailIPCAPS: an R package for iterative pruning to capture population structure
Chaichoompu, Kridsadakorn ULiege; Abegaz Yazew, Fentaw; Tongsima, Sissades et al

in Bioinformatics : Application Notes (in press)

Resolving population genetic structure is challenging, especially when dealing with closely related populations. Although Principal Component Analysis (PCA)-based methods and genomic var- iation with ... [more ▼]

Resolving population genetic structure is challenging, especially when dealing with closely related populations. Although Principal Component Analysis (PCA)-based methods and genomic var- iation with single nucleotide polymorphisms (SNPs) are widely used to describe shared genetic an- cestry, improvements can be made targeting fine-level population structure. This work presents an R package called IPCAPS, which uses SNP information for resolving possibly fine-level population structure. The IPCAPS routines are built on the iterative pruning Principal Component Analysis (ipP- CA) framework to systematically assign individuals to genetically similar subgroups. Our tool is able to detect and eliminate outliers in each iteration to avoid misclassification. It can be extended to de- tect subtle subgrouping in patients as well. In addition, IPCAPS supports different measurement scales for variables used to identify substructure. Hence, panels of gene expression and methylation data can be accommodated. [less ▲]

Detailed reference viewed: 62 (12 ULiège)
Full Text
See detailCapturing fine-scale population structure towards molecular reclassification of patients
Chaichoompu, Kridsadakorn ULiege

Doctoral thesis (2017)

During the past decades, population structure analysis has been playing an important role for stratifying populations and tracking back population ancestries. Population structure is mainly due to non ... [more ▼]

During the past decades, population structure analysis has been playing an important role for stratifying populations and tracking back population ancestries. Population structure is mainly due to non-random mating between subgroups in a population because of various reasons, being of social, cultural, or geographical nature. Genetic structure in populations may also arise from known or unknown family relationships. Complex disease analyses, in case-control genetic association studies particularly, can be affected by so-called cryptic relatedness, which refers to unobserved ancestral relationships between study individuals. As population structure may confound results from genetic association studies and studies that aim to detect clinically relevant substructure in patients, its detection is highly relevant. Revealing population structure is really essential. Notably, removing unwanted population structure in molecular-based patient subtypes detection is likely to lead to subtle or fine-scale remaining structure. In this thesis, we developed a novel genetic structure detection tool, hereafter referred to as IPCAPS, which can also be used as, or extended to, a tool for fine-scale reclassification of patients. IPCAPS utilizes a fixation index (FST) to measure the distance between clusters for iterative loop termination. An FST > 0.001 is typically seen as evidence for genetic differentiation between European populations. We also introduced a novel heuristic called EigenFit as one of the stopping criteria. Although our tool has been developed to easily accommodate multiple data types, we have illustrated the conception of IPCAPs and its performance on simulated and real-life data using panels of genome-wide SNP data. SNPs, standing for Single Nucleotide Polymorphisms, are the most common type of genetic variation among people. There are roughly 10 million of them. We evaluated the performance of IPCAPS using a variety of simulation studies and simulation scenarios, including varying sample sizes, varying SNP panel sizes, the absence or presence of outliers, large or very small genetic separation between synthetic populations. The performance of IPCAPS was measured by estimating accuracy and computation time. We observed that our method generally outperformed a selection of other iterative pruning based methods such as ipPCA, iNJclust, and SHIPS. Also in the presence of outliers, IPCAPS' computation time is largely affected by sample size, not by the number of SNPs included in the analysis. We furthermore validated our tools and proposed protocols on a variety of real-life datasets. These datasets differed in complexity and ranged from worldwide sample collections, over regional populations, to geographically confined samples. In particular, we analyzed data from the International HapMap Project, the 1000 Genomes Project, Africa and Thailand. We proposed a suitable protocol to correct for population stratification and to perform patient subgrouping in samples from the International IBD Genetics Consortium (IBD referring to inflammatory bowel disease). All developed analysis protocols involved guidelines for the interpretation of identified strata. In conclusion, IPCAPS is a promising structure detection analysis tool. It was able to identify fine structure in African and HapMap populations, previously unreported. IPCAPS analysis also suggested the presence of at least 3 subtypes of Crohn’s disease and at least 3 subtypes of Ulcerative Colitis patients. More work is needed to evaluate the importance of these findings in clinical practice and for precisions medicine. [less ▲]

Detailed reference viewed: 34 (6 ULiège)
Full Text
Peer Reviewed
See detailUsing IPCAPS to identify fine-scale population structure
Chaichoompu, Kridsadakorn ULiege; Fentaw Abegaz, Yazew ULiege; Tongsima, Sissades et al

Poster (2017, September 09)

Detailed reference viewed: 35 (8 ULiège)
Full Text
See detailDetermining fine population structure using iterative pruning
Chaichoompu, Kridsadakorn ULiege; Yazew, Fentaw Abegaz; Tongsima, Sissades et al

Poster (2017, July 10)

SNP-based information is used in several existing clustering methods to detect shared genetic ancestry or to identify population substructure (Price et al. 2006, Raj et al. 2016). Here, we present an ... [more ▼]

SNP-based information is used in several existing clustering methods to detect shared genetic ancestry or to identify population substructure (Price et al. 2006, Raj et al. 2016). Here, we present an unsupervised clustering algorithm called the iterative pruning method to capture population structure (IPCAPS). Our method supports ordinal data which can be applied directly to SNP data to identify fine-level population structure and it is built on the iterative pruning Principal Component Analysis (ipPCA) algorithm (Intarapanich et al. 2009). The IPCAPS involves an iterative process using multiple splits based on multivariate Gaussian mixture modeling of principal components and Clustering EM estimation as in Lebret et al. (2015). In each iteration, rough clusters and outliers are also identified using our own method called RubikClust. The fixation index (FST) is known to measure a distance between populations and FST = 0.001 may be said to be genetically distinct among the European populations (Tian et al. 2008, Huckins et al. 2014). To observe fine-level population structure using FST, we examined simulated scenarios of one population, 500-8,000 individuals, 5,000-10,000 independent SNPs in HWE (Balding and Nichols 1995), with 100 replicates for each scenario. The simulated SNPs were encoded as additive coding and there was no missing genotype generated. We introduced negative control by subjecting individuals to be separated into two groups using kmeans. We observed that FST values of divided groups were lower than 0.0008, which can be defined as the minimum FST to detect fine-level population structure. To evaluate the performance of our method, we tested different simulated data sets of 2-3 populations, 250 individuals per population, 10,000 independent SNPs in HWE, and FST=[0.0008,0.005], with 100 replicates for each data set. For real-life data sets, we applied the IPCAPS to Thai (Wangkumhang et al. 2013) and HapMap populations. Our method showed that a population classification accuracy was superior to the ipPCA in simulated scenarios of extremely subtle structure (FST=[0.0009,0.005]). In case of the Thai population, results to detect fine-level structure were obtained as well as in case of the HapMap populations. We are convinced that the IPCAPS has a potential to detect fine-level structure and it will be important in molecular reclassification studies of patients once underlying population structure has been removed. [less ▲]

Detailed reference viewed: 54 (5 ULiège)
Full Text
See detailDetermining fine population structure using iterative pruning
Chaichoompu, Kridsadakorn ULiege; Yazew, Fentaw Abegaz; Tongsima, Sissades et al

Poster (2017, April 25)

Detailed reference viewed: 18 (3 ULiège)
Full Text
Peer Reviewed
See detailUsing unsupervised clustering method and SNP-based information to identify fine-level population structure
Chaichoompu, Kridsadakorn ULiege; Yazew, Fentaw Abegaz; Tongsima, Sissades et al

Poster (2017, February 01)

Detailed reference viewed: 55 (4 ULiège)
Full Text
See detailCapturing fine-level structure using unsupervised clustering method with multiple data types
Chaichoompu, Kridsadakorn ULiege; Tongsima, Sissades; Shaw, Philip James et al

Poster (2016, June 17)

Several methods exist to detect shared genetic ancestry or to identify population substructure using SNP-based or haplotype-based information (Price et al. 2006, Lawson et al. 2012). Here, we propose an ... [more ▼]

Several methods exist to detect shared genetic ancestry or to identify population substructure using SNP-based or haplotype-based information (Price et al. 2006, Lawson et al. 2012). Here, we propose an unsupervised clustering method built on the ipPCA algorithm (Intarapanich et al. 2009). Our method supports both ordinal and categorical data, and it can be applied to panels of single locus and/or multiple loci data, or gene-based integrative summaries (Fouladi et al. 2015). Our method involves an iterative process using binary and ternary splits based on multivariate Gaussian mixture modeling of PCs and Clustering EM estimation as in (Lebret et al. 2015). To evaluate its performance, we examined different simulated scenarios of 2-4 populations, 500-8,000 individuals, 5,000-20,000 independent SNPs in HWE, and FST=[0.0007,0.006] (Balding and Nichols 1995), with 100 replicates for each scenario. SNPs were treated as categorical or continuous including ancestry-corrected SNPs. Haplotype-based runs used HapMap 3 data: CHB, CHD, and JPT. In simulated scenarios of extremely subtle structure (FST=[0.0009,0.006]), a population classification accuracy of 92% or greater was obtained, which was superior to ipPCA. Also in case of the HapMap populations, promising results to detect fine structure were obtained. We are convinced that our method has a potential to detect fine-level structure and it will be important in molecular reclassification studies of patients once underlying population structure has been removed. [less ▲]

Detailed reference viewed: 42 (6 ULiège)
Full Text
See detailCapturing structure in IIBDGC samples
Chaichoompu, Kridsadakorn ULiege

Scientific conference (2016, May 31)

Detailed reference viewed: 14 (1 ULiège)
Full Text
See detailA novel unsupervised clustering approach with multiple data types to reveal fine-level structure
Chaichoompu, Kridsadakorn ULiege; Tongsima, Sissades; Shaw, Philip James et al

Poster (2016, May 21)

Introduction: Several methods exist to identify population substructure that is due to shared genetic ancestry or regional proximity. These may be SNP-based or haplotype-based (Price et al. 2006, Lawson ... [more ▼]

Introduction: Several methods exist to identify population substructure that is due to shared genetic ancestry or regional proximity. These may be SNP-based or haplotype-based (Price et al. 2006, Lawson et al. 2012). Here, we present a flexible unsupervised clustering approach that is built on the ipPCA machinery (Intarapanich et al. 2009). Methods: Our method supports both numeric and categorical data, and can be applied to panels of SNPs and/or haplotypes, or gene-based integrative summaries (Fouladi et al. 2015). Unlike ipPCA, our method involves an iterative process using binary and ternary splits based on multivariate Gaussian mixture modeling of PCs and Clustering EM (CEM) estimation as in (Lebret et al. 2015). To assess performance, we considered different simulated scenarios of FST=[0.0005,0.006], 5,000-20,000 independent SNPs in HWE, 500-8,000 individuals, and 2-4 populations (Balding and Nichols 1995), with 100 replicates for each scenario. SNPs were treated as categorical or continuous (including ancestry-corrected SNPs). Haplotype-based runs used HapMap 3 data: CHB-JPT (FST=0.007) and CEU-TSI (FST=0.004). Result and Conclusion: In simulated scenarios of extremely subtle structure (FST=[0.0009,0.002]), a population classification accuracy of 92.56% or greater was obtained, which was superior to ipPCA. Promising results to detect fine structure were also obtained in case of the HapMap populations. We believe that the ability of our approach to detect subtle structure, including outlier individuals, will be important in molecular reclassification studies of patients from whom underlying population patterns have been removed. Grants: KC and KVS acknowledge FNRS, AS acknowledges ANR, ST acknowledges NSTDA, and PJS acknowledges TRF. [less ▲]

Detailed reference viewed: 60 (8 ULiège)
Full Text
See detailIterative pruning method of unsupervised clustering for categorical data
Chaichoompu, Kridsadakorn ULiege; Tongsima, Sissades; Shaw, Philip James et al

Poster (2016, April 03)

Single Nucleotide Polymorphisms (SNPs) are commonly used to identify population structures. Iterative pruning Principal Component Analysis (ipPCA) utilizes SNP profiles to assign individuals to ... [more ▼]

Single Nucleotide Polymorphisms (SNPs) are commonly used to identify population structures. Iterative pruning Principal Component Analysis (ipPCA) utilizes SNP profiles to assign individuals to subpopulations without making assumptions about ancestry. The strategy can be extrapolated to patient samples to identify molecular classes of patients. It is challenging to investigate the utility of substructure detection using profiles based on pre-defined genomic regions-of-interest rather than profiles based on SNPs. Using principles outlined in Fouladi, 2015, we can construct gene-based categorical variables representing different summary gene profiles in a region. These gene-based new constructs no longer have an equal number of unordered category levels. Here, we present C-PCA, an extension of ipPCA to target perform iterative pruning for categorical variables using optimal scaling. It allows performing non-linear principal component analyses to handle possibly non-linearly related variables with different measurement levels. To show the power of C-PCA compared to ipPCA, we simulated 500 individuals and assigned them to two populations of equal size. We considered genetic population distances using Fixation Index from 0.001 to 0.006. For each dataset, we simulated 10,000 independent random SNPs for 100 replicates using the Balding–Nichols model. These were used numerically in ipPCA and as categorical in C-PCA analysis. In conclusion, like ipPCA, we expect C-PCA to perform well in the presence of fine substructures. This paves the way to apply C-PCA to DNA-seq data and input categorical variable derived from genomic regions-of-interest to which common and rare variants are mapped. We foresee additional advantages of C-PCA in this context since region-based categorical variables are likely to be non-linearly associated at the background of underlying gene-gene interaction networks. C-PCA is implemented in R. [less ▲]

Detailed reference viewed: 50 (4 ULiège)
Full Text
See detailDetecting patient subgroups using reduced set of disease-related markers with iterative pruning Principal Component Analysis (ipPCA)
Chaichoompu, Kridsadakorn ULiege; Cleynen, Isabelle; Fouladi, Ramouna ULiege et al

Poster (2015, October 03)

Genetic markers such as Single Nucleotide Polymorphisms (SNPs) can be used to find subgroups of populations or patients with carefully selected clustering algorithms. The iterative pruning principal ... [more ▼]

Genetic markers such as Single Nucleotide Polymorphisms (SNPs) can be used to find subgroups of populations or patients with carefully selected clustering algorithms. The iterative pruning principal component analysis (ipPCA) has been shown to be a powerful tool to identify fine substructures within general populations based on SNP profiles. Usually, SNPs contributing to such profiles have passed rigorous quality control procedures, similar to the ones used for GWAs. Alternatively, attention is restricted to a smaller subset such as PCA-correlated SNPs. Here, we applied ipPCA on real-life data consisting of the 163 known inflammatory-bowel disease (IBD) associated loci in 13,400 healthy individuals and 29,500 IBD (16,902 Crohn’s disease (CD), and 12,598 ulcerative colitis (UC)) patients from the IIBDGC. Prior to clustering by ipPCA, in each group separately, we regressed out the first five Principal Components (PCs) that were computed from a filtered panel of genome-wide SNPs, to account for general population strata. Next, we applied ipPCA on the healthy group, to learn about the presence of a population-specific partitioning in controls. Then we performed three subphenotype analyses: CD only, UC only and the combined group of CD and UC patients (IBD). For each patient subgroup analysis and for the ipPCA analysis on controls, we highlighted and compared the key SNP drivers. CD patients could be molecularly reclassified in two groups, and similar for UC patients. The combined patient group could be subdivided in four groups. Finally, we compared demographic and clinical features among the different groups and looked for meaningful characterizations of adjusted patient clusters by performing pathway analysis on driver genes. [less ▲]

Detailed reference viewed: 34 (5 ULiège)
Full Text
See detailipPCA clustering to find substructures in populations or patient collections
Chaichoompu, Kridsadakorn ULiege

Scientific conference (2014, December 02)

Detailed reference viewed: 15 (4 ULiège)
Full Text
See detailCombining Genotype with LD-based haplotype information as input for iterative pruning principal component analysis (ipPCA) to improve population clustering
Chaichoompu, Kridsadakorn ULiege; Fouladi, Ramouna ULiege; Wangkumhang, Pongsakorn et al

Conference (2014, November 26)

Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations and often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns. To identify and ... [more ▼]

Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations and often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns. To identify and differentiate between subpopulations using a rich set of genetic markers, as using reduced sets of genetic markers for these purposes, can become challenging especially when similar geographic regions are involved or when spurious patterns are likely to exist. Notably, haplotype composition and the pattern of LD between markers may vary between larger populations but may also play a role within more confined geographic regions. Indeed, the structure of haplotypes in unrelated individuals can reveal useful information about genetic ancestry. Here, we use iterative pruning principal component analysis (ipPCA) [1] to identify and characterize subpopulations in an unsupervised way. Furthermore, we purpose to combine an LD-based haplotype encoding scheme with the ipPCA machinery to retrieve fine population substructures. Despite the complexities that are associated with haplotype inference, added value can be obtained when the LD structure between SNPs is exploited in the search for relevant population strata. As input data, either pruned genome-wide SNP data are used or multilocus haplotype information derived from the genome-wide SNP panel. Preliminary results indicate that ipPCA applied to pruned SNP data or ipPCA that explicitly uses multilocus information (haplotypes) give complementary information about population substructure for geographically confined populations. In fact, both methods address different aspects of population structure. [1] Intarapanich, A. et al. (2009), BMC Bioinformatics. 10: p. 382. [less ▲]

Detailed reference viewed: 15 (3 ULiège)
Full Text
See detailCombining genotype with allelic association as input for iterative pruning principal component analysis (ipPCA) to resolve population substructures
Chaichoompu, Kridsadakorn ULiege; Fouladi, Ramouna ULiege; Wangkumhang, Pongsakorn et al

Poster (2014, August 28)

Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations and often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns. Notably ... [more ▼]

Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations and often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns. Notably, haplotype composition and the pattern of LD between markers may vary between larger populations but may also play a role within more confined geographic regions. Indeed, knowledge about haplotypes in unrelated individuals can reveal useful information about genetic ancestry. Here, we use iterative pruning principal component analysis (ipPCA) [Intarapanich 2009] to identify and characterize subpopulations in an unsupervised way using a rich set of genetic markers since using reduced sets of genetic markers for these purposes can become challenging, especially when similar geographic regions are involved or when spurious patterns are likely to exist. As input data, either pruned genome-wide SNP data are used or multilocus haplotype information derived from the genome-wide SNP panel. These approaches are applied to real-life data from 4028 Vietnamese individuals [Khor 2012]. Preliminary results indicate that ipPCA applied to pruned SNP data or ipPCA that explicitly uses multilocus information (haplotypes) give complementary information about population substructure for geographically confined populations. Both methods address different aspects of population structure. In conclusion, we propose to combine an LD-based haplotype encoding scheme with the ipPCA machinery to retrieve fine population substructures. Despite the complexities that are associated with haplotype inference, added value can be obtained when the LD structure between SNPs is exploited in the search for relevant population strata. [less ▲]

Detailed reference viewed: 15 (5 ULiège)
Full Text
See detailDetecting population substructures
Chaichoompu, Kridsadakorn ULiege

Scientific conference (2014, June 05)

Detailed reference viewed: 13 (2 ULiège)
Full Text
Peer Reviewed
See detailLD-based haplotype encoding scheme with iterative pruning principal component analysis (ipPCA) to retrieve population substructures
Chaichoompu, Kridsadakorn ULiege; Fouladi, Ramouna ULiege; Wangkumhang, Pongsakorn et al

Poster (2014, April 29)

Objective To identify and differentiate between subpopulations using a rich set of genetic markers, as using reduced sets of genetic markers for these purposes can become challenging, especially when ... [more ▼]

Objective To identify and differentiate between subpopulations using a rich set of genetic markers, as using reduced sets of genetic markers for these purposes can become challenging, especially when similar geographic regions are involved or when spurious patterns are likely to exist. Method Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations and often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns. Notably, haplotype composition and the pattern of LD between markers may vary between larger populations but may also play a role within more confined geographic regions. Indeed, knowledge about haplotypes in unrelated individuals can reveal useful information about genetic ancestry. Here, we use iterative pruning principal component analysis (ipPCA) [1] to identify and characterize subpopulations in an unsupervised way. As input data, either pruned genome-wide SNP data are used (using PLINK 1.9 with the "indep-pairwise" option, window size = 100k, r2 < 0.25) or multilocus haplotype information derived from the genome-wide SNP panel (using BEAGLE 3.3.2 to infer haplotype). These approaches are applied to real-life data from 992 Thai individuals [2]. Result Preliminary results indicate that ipPCA applied to pruned SNP data or ipPCA that explicitly uses multilocus information (haplotypes) give complementary information about population substructure for geographically confined populations such as the Thai samples in this study. Both methods address different aspects of population structure. Detailed simulation studies are needed to identify the optimal scenarios for haplotype-based ipPCA. Conclusion In this work, we propose to combine an LD-based haplotype encoding scheme with the ipPCA machinery to retrieve fine population substructures. Despite the complexities that are associated with haplotype inference, added value can be obtained when the LD structure between SNPs is exploited in the search for relevant population strata. References 1. Intarapanich, A., et al., Iterative pruning PCA improves resolution of highly structured populations. BMC Bioinformatics, 2009. 10: p. 382. 2. Wangkumhang, P., et al., Insight into the peopling of Mainland Southeast Asia from Thai population genetic structure. PLoS One, 2013. 8(11): p. e79522. [less ▲]

Detailed reference viewed: 170 (23 ULiège)
Full Text
See detailHaplotype information combined with iterative pruning PCA (ipPCA) to improve population clustering
Chaichoompu, Kridsadakorn ULiege; Fouladi, Ramouna; Wangkumhang, Pongsakorn et al

Scientific conference (2014, April 01)

Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations. Often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns or small subsets of ... [more ▼]

Single Nucleotide Polymorphisms (SNPs) are commonly used to capture variations between populations. Often genome-wide SNP data are pruned based on linkage disequilibrium (LD) patterns or small subsets of SNPs are selected (e.g. PCA-correlated SNPs) to reproduce the genomic structure of the complete data set. Identifying and differentiating between subpopulations using such a reduced set can become challenging, especially when similar geographic regions are involved or when spurious patterns are likely to exist. Although PCA-based methods can resolve structure, they cannot infer ancestry. On the other hand, the structure of haplotypes in unrelated individuals can reveal useful information about genetic ancestry. Notably, haplotype composition and the pattern of LD between markers may vary between larger populations but may also play a role within more confined geographic regions. In addition, iterative pruning principal component analysis (ipPCA) has been shown to be a powerful tool to cluster subpopulations based on SNP profiles. Despite the complexities that are associated with haplotype inference, we argue that added value can be obtained when the LD structure between SNPs is exploited in the search for relevant population strata. In this work, we propose to combine an LD-based novel haplotype encoding scheme with the ipPCA machinery to retrieve fine population substructures. The approach is compared to state-of-the-art methods in the context of population substructure and admixture analysis. [less ▲]

Detailed reference viewed: 108 (18 ULiège)
See detailA system for DNA mapping from pyrosequencer.
Assawamakin, Anunchai; Kulawonganunchai, Supasak; Chaichoompu, Kridsadakorn ULiege et al

Patent (2013)

Detailed reference viewed: 16 (1 ULiège)