[en] [en] BACKGROUND: Structural variants (SVs) are chromosomal segments that differ between genomes, such as deletions, duplications, insertions, inversions and translocations. The genomics revolution enabled the discovery of sub-microscopic SVs via array and whole-genome sequencing (WGS) data, paving the way to unravel the functional impact of SVs. Recent human expression QTL mapping studies demonstrated that SVs play a disproportionally large role in altering gene expression, underlining the importance of including SVs in genetic analyses. Therefore, this study aimed to generate and explore a high-quality bovine SV catalogue exploiting a unique cattle family cohort data (total 266 samples, forming 127 trios).
RESULTS: We curated 13,731 SVs segregating in the population, consisting of 12,201 deletions, 1,509 duplications, and 21 multi-allelic CNVs (> 50-bp). Of these, we validated a subset of copy number variants (CNVs) utilising a direct genotyping approach in an independent cohort, indicating that at least 62% of the CNVs are true variants, segregating in the population. Among gene-disrupting SVs, we prioritised two likely high impact duplications, encompassing ORM1 and POPDC3 genes, respectively. Liver expression QTL mapping results revealed that these duplications are likely causing altered gene expression, confirming the functional importance of SVs. Although most of the accurately genotyped CNVs are tagged by single nucleotide polymorphisms (SNPs) ascertained in WGS data, most CNVs were not captured by individual SNPs obtained from a 50K genotyping array.
CONCLUSION: We generated a high-quality SV catalogue exploiting unique whole genome sequenced bovine family cohort data. Two high impact duplications upregulating the ORM1 and POPDC3 are putative candidates for postpartum feed intake and hoof health traits, thus warranting further investigation. Generally, CNVs were in low LD with SNPs on the 50K array. Hence, it remains crucial to incorporate CNVs via means other than tagging SNPs, such as investigation of tagging haplotypes, direct imputation of CNVs, or direct genotyping as done in the current study. The SV catalogue and the custom genotyping array generated in the current study will serve as valuable resources accelerating utilisation of full spectrum of genetic variants in bovine genomes.
Disciplines :
Genetics & genetic processes
Author, co-author :
Lee, Young Lim ; Université de Liège - ULiège > Département de gestion vétérinaire des Ressources Animales (DRA)
Bosse, Mirte; Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
Takeda, Haruko ; Université de Liège - ULiège > Département de gestion vétérinaire des Ressources Animales (DRA) > Génomique animale
Moreira, Gabriel Costa Monteiro; Unit of Animal Genomics, Faculty of Veterinary Medicine, GIGA-R &, University of Liège, Liège, Belgium
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A, Huddleston J, et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015;526(7571):75–81. DOI: 10.1038/nature15394
Campbell CD, Eichler EE. Properties and rates of germline mutations in humans. Trends Genet [Internet]. 2013;29(10):575–84. Available from: https://doi.org/10.1016/j.tig.2013.04.005
Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, et al. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464(7289):704–12. DOI: 10.1038/nature08516
Chiang C, Scott AJ, Davis JR, Tsang EK, Li X, Kim Y, et al. The impact of structural variation on human gene expression. Nat Genet. 2017;49(5):692–9. DOI: 10.1038/ng.3834
Scott AJ, Chiang C, Hall IM. Structural variants are a major source of gene expression differences in humans and often affect multiple nearby genes.Genome Biol. 2021
Handsaker RE, Van Doren V, Berman JR, Genovese G, Kashin S, Boettger LM et al. Large multiallelic copy number variations in humans. Nat Genet [Internet]. 2015;47(3):296–303. Available from: https://doi.org/10.1038/ng.3200
Weischenfeldt J, Symmons O, Spitz F, Korbel JO. Phenotypic impact of genomic structural variation: insights from and for human disease. Nat Rev Genet. 2013;14:125–38. DOI: 10.1038/nrg3373
Bickhart DM, Liu GE. The challenges and importance of structural variation detection in livestock. Front Genet. 2014;5(FEB):1–14.
Clop A, Vidal O, Amills M. Copy number variation in the genomes of domestic animals. Anim Genet. 2012;43(5):503–17. DOI: 10.1111/j.1365-2052.2012.02317.x
Huddleston J, Eichler EE. An incomplete understanding of human genetic variation. Genetics. 2016;202(4):1251–4. DOI: 10.1534/genetics.115.180539
Cameron DL, Di Stefano L, Papenfuss AT. Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software. Nat Commun [Internet]. 2019;10(1):1–11. Available from: https://doi.org/10.1038/s41467-019-11146-4
Britt JH, Cushman RA, Dechow CD, Dobson H, Humblot P, Hutjens MF et al. Review: Perspective on high-performing dairy cows and herds. Animal [Internet]. 2021;(xxxx):100298. Available from: https://doi.org/10.1016/j.animal.2021.100298
Lee Y-L, Bosse M, Mullaart E, Groenen MAM, Veerkamp RF, Bouwman AC. Functional and population genetic features of copy number variations in two dairy cattle populations. BMC Genomics. 2020;21(1):1–15. DOI: 10.1186/s12864-020-6496-1
Layer RM, Chiang C, Quinlan AR, Hall IM. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol. 2014;15(R84):1–19.
Pedersen BS, Quinlan AR. Duphold: scalable, depth-based annotation and curation of high-confidence structural variant calls.Gigascience. 2019;(March):1–5.
Lee Y-L, Takeda H, Moreira GCM, Karim L, Mullaart E, Coppieters W et al. A 12 kb multi-allelic copy number variation encompassing a GC gene enhancer is associated with mastitis resistance in dairy cattle. PLoS Genet [Internet]. 2021;17(7):1–27. Available from: https://doi.org/10.1371/journal.pgen.1009331
Boichard D, Boussaha M, Capitan A, Rocha D, Sanchez MP, Tribout T et al. Experience from large scale use of the EuroGenomics custom SNP chip in cattle. In: 11th World Congress on Genetics Applied to Livestock Production. 2018. p. 1–6.
Zhou Y, Yang L, Han X, Han J, Hu Y, Li F et al. Assembly of a pangenome for global cattle reveals missing sequences and novel structural variations, providing new insights into their diversity and evolutionary history. 2022;1–17.
Derks MFL, Lopes MS, Bosse M, Madsen O, Dibbits B, Harlizius B et al. Balancing selection on a recessive lethal deletion with pleiotropic effects on two neighboring genes in the porcine genome. PLoS Genet [Internet]. 2018;14(9):1–20. Available from: https://doi.org/10.1371/journal.pgen.1007661
Kadri NK, Sahana G, Charlier C, Iso-Touru T, Guldbrandtsen B, Karim L et al. A 660-Kb Deletion with Antagonistic Effects on Fertility and Milk Production Segregates at High Frequency in Nordic Red Cattle: Additional Evidence for the Common Occurrence of Balancing Selection in Livestock.PLoS Genet. 2014;10(1).
Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, et al. A structural variation reference for medical and population genetics. Nature. 2020;581(7809):444–51. DOI: 10.1038/s41586-020-2287-8
Charlier C, Agerholm JS, Coppieters W, Karlskov-mortensen P, Li W, Jong G, De, et al. A deletion in the bovine FANCI gene compromises fertility by causing fetal death and Brachyspina. plos. 2012;7(8):2–8.
Schütz E, Wehrhahn C, Wanjek M, Bortfeld R, Wemheuer WE, Beck J, et al. The Holstein Friesian lethal haplotype 5 (HH5) results from a complete deletion of TBF1M and cholesterol deficiency (CDH) from an ERV-(LTR) insertion into the coding region of APOB. PLoS ONE. 2016;11(4):1–15. DOI: 10.1371/journal.pone.0154602
Kalitsis P, Fowler KJ, Earle E, Hill J, Choo KHA. Targeted disruption of mouse centromere protein C gene leads to mitotic disarray and early embryo death. Proc Natl Acad Sci U S A. 1998;95(3):1136–41. DOI: 10.1073/pnas.95.3.1136
Sun Y, Yang Y, Qin Z, Cai J, Guo X, Tang Y, et al. The acute-phase protein orosomucoid regulates food intake and energy homeostasis via leptin receptor signaling pathway. Diabetes. 2016;65(6):1630–41. DOI: 10.2337/db15-1193
Brown WE, Garcia M, Mamedova LK, Christman KR, Zenobi MG, Staples CR et al. Acute-phase protein α-1-acid glycoprotein is negatively associated with feed intake in postpartum dairy cows. J Dairy Sci [Internet]. 2021;104(1):806–17. Available from: https://doi.org/10.3168/jds.2020-19025
McGuckin MM, Giesy SL, Davis AN, Abyeta MA, Horst EA, Saed Samii S, et al. The acute phase protein orosomucoid 1 is upregulated in early lactation but does not trigger appetite-suppressing STAT3 signaling via the leptin receptor. J Dairy Sci. 2020;103(5):4765–76. DOI: 10.3168/jds.2019-18094
Fang L, Cai W, Liu S, Canela-Xandri O, Gao Y, Jiang J, et al. Comprehensive analyses of 723 transcriptomes enhance genetic and biological interpretations for complex traits in cattle. Genome Res. 2020;30(5):790–801. DOI: 10.1101/gr.250704.119
Butty AM, Chud TCS, Cardoso DF, Lopes LSF, Miglior F, Schenkel FS, et al. Genome-wide association study between copy number variants and hoof health traits in Holstein dairy cattle. J Dairy Sci. 2021;104(7):8050–61. DOI: 10.3168/jds.2020-19879
Villar D, Berthelot C, Aldridge S, Rayner TF, Lukk M, Pignatelli M et al. Enhancer evolution across 20 mammalian species. Cell [Internet]. 2015;160(3):554–66. Available from: https://doi.org/10.1016/j.cell.2015.01.006
Hu Z-L, Park C, Reecy J. Developmental progress and current status of the animal QTLdb. Nucleic Acids Res. 2016;44(D1):827–33. DOI: 10.1093/nar/gkv1233
Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 2019;20(1):8–11. DOI: 10.1186/s13059-019-1720-5
Mesbah-Uddin M, Guldbrandtsen B, Iso-Touru T, Vilkki J, De Koning D-J, Boichard D, et al. Genome-wide mapping of large deletions and their population-genetic properties in dairy cattle. DNA Res. 2017;25(September 2017):49–59.
Kommadath A, Grant JR, Krivushin K, Butty AM, Baes CF, Carthy TR, et al. A large interactive visual database of copy number variants discovered in taurine cattle. Gigascience. 2019;8(6):1–12. DOI: 10.1093/gigascience/giz073
Geibel J, Praefke NP, Weigend S, Simianer H, Reimer C. Assessment of linkage disequilibrium patterns between structural variants and single nucleotide polymorphisms in three commercial chicken populations. BMC Genomics [Internet]. 2022;23(1):1–14. Available from: https://doi.org/10.1186/s12864-022-08418-7
Chen L, Chamberlain AJ, Reich CM, Daetwyler HD, Hayes BJ. Detection and validation of structural variations in bovine whole-genome sequence data. Genet Sel Evol. 2017;49(1):1–13.
Bertolotti AC, Layer RM, Gundappa MK, Gallagher MD, Pehlivanoglu E, Nome T et al. The structural variation landscape in 492 Atlantic salmon genomes.Nat Commun. 2020;11(1).
Zhao X, Collins RL, Lee WP, Weber AM, Jun Y, Zhu Q et al. Expectations and blind spots for structural variation detection from long-read assemblies and short-read genome sequencing technologies. Am J Hum Genet [Internet]. 2021;108(5):919–28. Available from: https://doi.org/10.1016/j.ajhg.2021.03.014
Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature. 2020;583(7814):83–9. DOI: 10.1038/s41586-020-2371-0
Zhang Z, Guillaume F, Sartelet A, Charlier C, Georges M, Farnir F, et al. Ancestral haplotype-based association mapping with generalized linear mixed models accounting for stratification. Bioinformatics. 2012;28(19):2467–73. DOI: 10.1093/bioinformatics/bts348
Li B, Fang L, Null DJ, Hutchison JL, Connor EE, VanRaden PM, et al. High-density genome-wide association study for residual feed intake in Holstein dairy cattle. J Dairy Sci. 2019;102(12):11067–80. DOI: 10.3168/jds.2019-16645
Hu ZL, Park CA, Reecy JM. Building a livestock genetic and genomic information knowledgebase through integrative developments of animal QTLdb and CorrDB. Nucleic Acids Res. 2019;47(D1):D701–10. DOI: 10.1093/nar/gky1084
Veerkamp RF, Calus MPL, De Jong G, Linde R, van der, Haas Y, De. Breeding Value for Dry Matter Intake for Dutch Bulls based on DGV for DMI and BV for Predictors. In: 10th World Congress of Genetics Applied to Livestock Production. 2014.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv Prepr arXiv [Internet]. 2013;00(00):3. Available from: http://arxiv.org/abs/1303.3997
Rosen BD, Bickhart DM, Schnabel RD, Koren S, Elsik CG, Tseng E, et al. De novo assembly of the cattle reference genome with single-molecule sequencing. Gigascience. 2020;9(3):1–9. DOI: 10.1093/gigascience/giaa021
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60. DOI: 10.1093/bioinformatics/btp324
Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinfo. 2015;31(February):2032–4. DOI: 10.1093/bioinformatics/btv098
Faust GG, Hall IM. SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics. 2014;30(17):2503–5. DOI: 10.1093/bioinformatics/btu314
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet [Internet]. 2011;43(5):491–8. Available from: https://doi.org/10.1038/ng.806
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303. DOI: 10.1101/gr.107524.110
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Angel G, del Levy-Moonshine A et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Vol. 11,Curr Protoc Bioinformatics. 2013.
Purcell S, Neale B, Todd-brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a Tool Set for whole-genome Association and Population-Based linkage analyses. Am J Hum Genet. 2007;81(September):559–75. DOI: 10.1086/519795
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GRS, Thormann A et al. The Ensembl Variant Effect Predictor. bioRxiv [Internet]. 2016;042374. Available from: http://biorxiv.org/content/early/2016/03/04/042374.abstract
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9. DOI: 10.1038/nmeth.1923
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE et al. Model-based Analysis of ChIP-Seq (MACS).Genome Biol. 2008;(9):R137.
Quinlan AR, Hall IM, BEDTools:. A flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. DOI: 10.1093/bioinformatics/btq033
Browning BL, Zhou Y, Browning SR. A One-Penny Imputed Genome from Next-Generation Reference Panels. Am J Hum Genet [Internet]. 2018;103(3):338–48. Available from: https://doi.org/10.1016/j.ajhg.2018.07.015
Wathes DC, Cheng Z, Salavati M, Buggiotti L, Takeda H, Tang L et al. Relationships between metabolic profiles and gene expression in liver and leukocytes of dairy cows in early lactation. J Dairy Sci [Internet]. 2021;104(3):3596–616. Available from: https://doi.org/10.3168/jds.2020-19165
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol [Internet]. 2019;37(August). Available from: https://doi.org/10.1038/s41587-019-0201-4
Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotech. 2016;33(3):290–5. DOI: 10.1038/nbt.3122
Shabalin AA. Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics. 2012;28(10):1353–8. DOI: 10.1093/bioinformatics/bts163