Article (Scientific journals)
Benchmarking phasing software with a whole-genome sequenced cattle pedigree.
Oget, Claire; Kadri, Naveen Kumar; Moreira, Gabriel Costa Monteiro et al.
2022In BMC Genomics, 23 (1), p. 130
Peer Reviewed verified by ORBi
 

Files


Full Text
Oget_BMC_Genomics2022.pdf
Publisher postprint (3.44 MB)
Download

All documents in ORBi are protected by a user license.

Send to



Details



Keywords :
Cattle; Haplotype; Phasing; Sequencing data; Algorithms; Animals; Cattle/genetics; Dogs; Haplotypes; Pedigree; Polymorphism, Single Nucleotide; Software; Benchmarking; Genome; Biotechnology; Genetics
Abstract :
[en] BACKGROUND: Accurate haplotype reconstruction is required in many applications in quantitative and population genomics. Different phasing methods are available but their accuracy must be evaluated for samples with different properties (population structure, marker density, etc.). We herein took advantage of whole-genome sequence data available for a Holstein cattle pedigree containing 264 individuals, including 98 trios, to evaluate several population-based phasing methods. This data represents a typical example of a livestock population, with low effective population size, high levels of relatedness and long-range linkage disequilibrium. RESULTS: After stringent filtering of our sequence data, we evaluated several population-based phasing programs including one or more versions of AlphaPhase, ShapeIT, Beagle, Eagle and FImpute. To that end we used 98 individuals having both parents sequenced for validation. Their haplotypes reconstructed based on Mendelian segregation rules were considered the gold standard to assess the performance of population-based methods in two scenarios. In the first one, only these 98 individuals were phased, while in the second one, all the 264 sequenced individuals were phased simultaneously, ignoring the pedigree relationships. We assessed phasing accuracy based on switch error counts (SEC) and rates (SER), lengths of correctly phased haplotypes and the probability that there is no phasing error between a pair of SNPs as a function of their distance. For most evaluated metrics or scenarios, the best software was either ShapeIT4.1 or Beagle5.2, both methods resulting in particularly high phasing accuracies. For instance, ShapeIT4.1 achieved a median SEC of 50 per individual and a mean haplotype block length of 24.1 Mb (scenario 2). These statistics are remarkable since the methods were evaluated with a map of 8,400,000 SNPs, and this corresponds to only one switch error every 40,000 phased informative markers. When more relatives were included in the data (scenario 2), FImpute3.0 reconstructed extremely long segments without errors. CONCLUSIONS: We report extremely high phasing accuracies in a typical livestock sample. ShapeIT4.1 and Beagle5.2 proved to be the most accurate, particularly for phasing long segments and in the first scenario. Nevertheless, most tools achieved high accuracy at short distances and would be suitable for applications requiring only local haplotypes.
Disciplines :
Agriculture & agronomy
Genetics & genetic processes
Author, co-author :
Oget, Claire ;  Université de Liège - ULiège > Département de gestion vétérinaire des Ressources Animales (DRA)
Kadri, Naveen Kumar;  Animal Genomics, ETH Zürich, 8092, Zürich, Switzerland
Moreira, Gabriel Costa Monteiro;  Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège (B34), 4000, Liège, Belgium
Karim, Latifa ;  Université de Liège - ULiège > Département de gestion vétérinaire des Ressources Animales (DRA) > Génomique animale
Coppieters, Wouter ;  Université de Liège - ULiège > Département de gestion vétérinaire des Ressources Animales (DRA) > Génomique animale
Georges, Michel  ;  Université de Liège - ULiège > GIGA > GIGA Cardiovascular Sciences - Molecular Biomimetic and Protein Engineering Laboratory
Druet, Tom  ;  Université de Liège - ULiège > GIGA > GIGA Medical Genomics - Unit of Animal Genomics
Language :
English
Title :
Benchmarking phasing software with a whole-genome sequenced cattle pedigree.
Publication date :
2022
Journal title :
BMC Genomics
eISSN :
1471-2164
Publisher :
BioMed Central Ltd, England
Volume :
23
Issue :
1
Pages :
130
Peer reviewed :
Peer Reviewed verified by ORBi
Tags :
CÉCI : Consortium des Équipements de Calcul Intensif
Funders :
F.R.S.-FNRS - Fonds de la Recherche Scientifique
Funding text :
This work was funded by the European Research Council (award number: ERC AdG-GA323030, “DAMONA” project) and the Fonds de la Recherche Scientifique-FNRS (F.R.S.-FNRS) under Grant T.0080.20 (“LoCO motifs” research project).We thank Erik Mullaart and CRV (Arnhem, The Netherlands) for providing the samples. Tom Druet is Senior Research Associate from the Fonds de la Recherche Scientifique - FNRS (F.R.S.-FNRS). Computational resources have been provided by the Consortium des Équipements de Calcul Intensif (CÉCI), funded by the Fonds de la Recherche Scientifique - FNRS (F.R.S.-FNRS) under Grant No. 2.5020.11 and by the Walloon Region. The authors also acknowledge use of the GIGA high performance computing cluster for conducting the study reported in this paper.
Available on ORBi :
since 23 April 2022

Statistics


Number of views
70 (13 by ULiège)
Number of downloads
33 (3 by ULiège)

Scopus citations®
 
6
Scopus citations®
without self-citations
4
OpenCitations
 
1
OpenAlex citations
 
6

Bibliography


Similar publications



Contact ORBi