[en] We report the generation of an organism-wide catalog of 976,813 cis-acting regulatory elements for the bovine detected by the assay for transposase accessible chromatin using sequencing (ATAC-seq). We regroup these regulatory elements in 16 components by nonnegative matrix factorization. Correlation between the genome-wide density of peaks and transcription start sites, correlation between peak accessibility and expression of neighboring genes, and enrichment in transcription factor binding motifs support their regulatory potential. Using a previously established catalog of 12,736,643 variants, we show that the proportion of single-nucleotide polymorphisms mapping to ATAC-seq peaks is higher than expected and that this is owing to an approximately 1.3-fold higher mutation rate within peaks. Their site frequency spectrum indicates that variants in ATAC-seq peaks are subject to purifying selection. We generate eQTL data sets for liver and blood and show that variants that drive eQTL fall into liver- and blood-specific ATAC-seq peaks more often than expected by chance. We combine ATAC-seq and eQTL data to estimate that the proportion of regulatory variants mapping to ATAC-seq peaks is approximately one in three and that the proportion of variants mapping to ATAC-seq peaks that are regulatory is approximately one in 25. We discuss the implication of these findings on the utility of ATAC-seq information to improve the accuracy of genomic selection.
Disciplines :
Genetics & genetic processes
Author, co-author :
Yuan, Can ; Université de Liège - ULiège > GIGA > GIGA Medical Genomics - Unit of Animal Genomics
Lopdell, Thomas ; Université de Liège - ULiège > GIGA > GIGA Administration ; Research and Development, Livestock Improvement Corporation, Hamilton 3240, New Zealand
Petrov, Vyacheslav A; Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège, 4000 Liège, Belgium
Oget-Ebrad, Claire ; Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège, 4000 Liège, Belgium
Moreira, Gabriel Costa Monteiro; Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège, 4000 Liège, Belgium
Gualdrón Duarte, José Luis; Unit of Animal Genomics, GIGA-R and Faculty of Veterinary Medicine, University of Liège, 4000 Liège, Belgium
Sartelet, Arnaud ; Université de Liège - ULiège > Département d'Enseignement et de Clinique des animaux de Production (DCP)
Cheng, Zhangrui; Royal Veterinary College, Hatfield, Herts AL9 7TA, United Kingdom
Salavati, Mazdak; Royal Veterinary College, Hatfield, Herts AL9 7TA, United Kingdom
Wathes, D Claire; Royal Veterinary College, Hatfield, Herts AL9 7TA, United Kingdom
Crowe, Mark A; School of Veterinary Medicine, University College Dublin, Dublin 4, Ireland
FP7 - 613689 - GPLUSE - Genotype and Environment contributing to the sustainability of dairy cow production systems through the optimal integration of genomic selection and novel management protocols based on the development
Name of the research project :
Damona
Funders :
ERC - European Research Council EU - European Union Région wallonne F.R.S.-FNRS - Fonds de la Recherche Scientifique
Funding number :
CAUSEL grant from the Walloon Region (no.1710030)
Funding text :
We thank Calixte Bayrou, Ken Kusakabe, Ruth Appeltant, Anne-
Sophie Van Laere, and all members of Michel Georges’ laboratory
for their help for sample collections, technical support, and fruitful
discussion. We also thank the support provided by the GIGA
Genomics and Bioinformatics core facilities. This work was funded
by the Damona European Research Council advanced grant from
the EU (AdG-GA323030), the GplusE FP7 grant from the EU (no.
613689), the CAUSEL grant from the Walloon Region (no.1710030), and financial support from Inoveo. C.C. and T.D. are senior
research associate and research director from the Fonds de la
Recherche Scientifique. Computational resources have been provided
by the Consortium des Équipements de Calcul Intensif,
funded by the Fonds de la Recherche Scientifique de Belgique
(no. 2.5020.11) and by the Walloon Region.
Adey A, Morrison HG, Asan, Xun X, Kitzman JO, Turner EH, Stackhouse B, MacKenzie AP, Caruccio NC, Zhang X, et al. 2010. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol 11: R119. doi:10.1186/gb-2010-11-12-r119
Anders S, Huber W. 2010. Differential expression analysis for sequence count data. Genome Biol 11: R106. doi:10.1186/gb-2010-11-10-r106
Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. 2007. GenABEL: an R library for genome-wide association analysis. Bioinformatics 23: 1294–1296. doi:10.1093/bioinformatics/btm108
Barnett DW, Garrison EK, Quinlan AR, Strömberg MP, Marth GT. 2011. BamTools: a C++ API and toolkit for analyzing and managing BAM files. Bioinformatics 27: 1691–1692. doi:10.1093/bioinformatics/btr174
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Statist Soc 57: 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x
Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120. doi:10.1093/bioinformatics/btu170
Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. 2013. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10: 1213–1218. doi:10.1038/nmeth.2688
Cai W, Li C, Liu S, Zhou C, Yin H, Song J, Zhang Q, Zhang S. 2018. Genome wide identification of novel long non-coding RNAs and their potential associations with milk proteins in Chinese Holstein cows. Front Genet 9: 281. doi:10.3389/fgene.2018.00281
Cooper GM, Stone EA, Asimenos G, NISC Comparative Sequencing Program, Green ED, Batzoglou S, Sidow A. 2005. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15: 901–913. doi:10.1101/gr.3577405
. 2017. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Methods 14: 959–962. doi:10.1038/ nmeth.4396
Dado-Senn B, Skibiel AL, Fabris TF, Zhang Y, Dahl GE, Peñagaricano F, Laporta J. 2018. RNA-seq reveals novel genes and pathways involved in bovine mammary involution during the dry period and under environmental heat stress. Sci Rep 8: 11096. doi:10.1038/s41598-018-29420-8
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, et al. 2021. Twelve years of SAMtools and BCFtools. Gigascience 10: giab008. doi:10.1093/gigascience/giab008
Das S, Forer L, Schönherr S, Sidore C, Locke AE, Kwong A, Vrieze SI, Chew EY, Levy S, McGue M, et al. 2016. Next-generation genotype imputation service and methods. Nat Genet 48: 1284–1287. doi:10.1038/ng.3656
Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S. 2010. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 6: e1001025. doi:10.1371/journal.pcbi.1001025
Delaneau O, Ongen H, Brown AA, Fort A, Panousis NI, Dermitzakis ET. 2017. A complete tool set for molecular QTL discovery and analysis. Nat Commun 8: 15452. doi:10.1038/ncomms15452
The ENCODE Project Consortium, Abascal F, Acosta R, Addleman NJ, Adrian J, Afzal V, Ai R, Aken B, Akiyama JA, Jammal OA, et al. 2020. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583: 699–710. doi:10.1038/s41586-020-2493-4
Fang L, Liu S, Liu M, Kang X, Lin S, Li B, Connor EE, Baldwin RL, Tenesa A, Ma L, et al. 2019. Functional annotation of the cattle genome through systematic discovery and characterization of chromatin states and butyrate-induced variations. BMC Biol 17: 68. doi:10.1186/s12915-019-0687-8
Fang L, Cai W, Liu S, Canela-Xandri O, Gao Y, Jiang J, Rawlik K, Li B, Schroeder SG, Rosen BD, et al. 2020. Comprehensive analyses of 723 transcriptomes enhance genetic and biological interpretations for complex traits in cattle. Genome Res 30: 790–801. doi:10.1101/gr.250704 .119
Fink T, Lopdell TJ, Tiplady K, Handley R, Johnson TJJ, Spelman RJ, Davis SR, Snell RG, Littlejohn MD. 2020. A new mechanism for a familiar mutation: Bovine DGAT1 K232A modulates gene expression through multi-junction exon splice enhancement. BMC Genomics 21: 591. doi:10.1186/s12864-020-07004-z
Foissac S, Djebali S, Munyard K, Vialaneix N, Rau A, Muret K, Esquerré D, Zytnicki M, Derrien T, Bardou P, et al. 2019. Multi-species annotation of transcriptome and chromatin structure in domesticated animals. BMC Biol 17: 108. doi:10.1186/s12915-019-0726-5
Freking BA, Murphy SK, Wylie AA, Rhodes SJ, Keele JW, Leymaster KA, Jirtle RL, Smith TPL. 2002. Identification of the single base change causing the callipyge muscle hypertrophy phenotype, the only known example of polar overdominance in mammals. Genome Res 12: 1496–1506. doi:10.1101/gr.571002
García-Ruiz A, Cole JB, VanRaden PM, Wiggans GR, Ruiz-López FJ, Van Tassell CP. 2016. Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc Natl Acad Sci 113: E3995–E4004. doi:10.1073/pnas.1519061113
Georges M, Charlier C, Smit M, Davis E, Shay T, Tordoir X, Takeda H, Caiment F, Cockett N. 2004. Toward molecular understanding of polar overdominance at the ovine callipyge locus. Cold Spring Harb Symp Quant Biol 69: 477–484. doi:10.1101/sqb.2004.69.477
Graf A, Krebs S, Zakhartchenko V, Schwalb B, Blum H, Wolf E. 2014. Fine mapping of genome activation in bovine embryos by RNA sequencing. Proc Natl Acad Sci 111: 4139–4144. doi:10.1073/pnas.1321569111
Halstead MM, Kern C, Saelao P, Wang Y, Chanthavixay G, Medrano JF, Van Eenennaam AL, Korf I, Tuggle CK, Ernst CW, et al. 2020a. A comparative analysis of chromatin accessibility in cattle, pig, and mouse tissues. BMC Genomics 21: 698. doi:10.1186/s12864-020-07078-9
Halstead MM, Ma X, Zhou C, Schultz RM, Ross PJ. 2020b. Chromatin remodeling in bovine embryos indicates species-specific regulation of genome activation. Nat Commun 11: 4654. doi:10.1038/s41467-020-18508-3
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C, Singh H, Glass CK. 2010. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38: 576–589. doi:10.1016/j.molcel.2010.05.004
Johnston D, Kim J, Taylor JF, Earley B, McCabe MS, Lemon K, Duffy C, McMenamy M, Cosby SL, Waters SM. 2021. ATAC-seq identifies regions of open chromatin in the bronchial lymph nodes of dairy calves experimentally challenged with bovine respiratory syncytial virus. BMC Genomics 22: 14. doi:10.1186/s12864-020-07268-5
Kaiser VB, Talmane L, Kumar Y, Semple F, MacLennan M, Deciphering Developmental Disorders Study, FitzPatrick DR, Taylor MS, Semple CA. 2021. Mutational bias in spermatogonia impacts the anatomy of regulatory sites in the human genome. Genome Res 31: 1994–2007. doi:10.1101/gr.275407.121
Karim L, Takeda H, Lin L, Druet T, Arias JAC, Baurain D, Cambisano N, Davis SR, Farnir F, Grisart B, et al. 2011. Variants modulating the expression of a chromosome domain encompassing PLAG1 influence bovine stature. Nat Genet 43: 405–413. doi:10.1038/ng.814
Kern C, Wang Y, Xu X, Pan Z, Halstead M, Chanthavixay G, Saelao P, Waters S, Xiang R, Chamberlain A, et al. 2021. Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research. Nat Commun 12: 1821. doi:10.1038/s41467-021-22100-8
Khansefid M, Pryce JE, Bolormaa S, Chen Y, Millen CA, Chamberlain AJ, Vander Jagt CJ, Goddard ME. 2018. Comparing allele specific expression and local expression quantitative trait loci and the influence of gene expression on complex trait variation in cattle. BMC Genomics 19: 793. doi:10.1186/s12864-018-5181-0
Kim D, Langmead B, Salzberg SL. 2015. HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12: 357–360. doi:10.1038/nmeth .3317
Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9: 357–359. doi:10.1038/nmeth.1923
Lee Y-L, Takeda H, Costa Monteiro Moreira G, Karim L, Mullaart E, Coppieters W, The GplusE consortium, Appeltant R, Veerkamp RF, Groenen MAM, et al. 2021. A 12 kb multi-allelic copy number variation encompassing a GC gene enhancer is associated with mastitis resistance in dairy cattle. PLoS Genet 17: e1009331. doi:10.1371/journal.pgen.1009331
Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ, Washietl S, Kheradpour P, Ernst J, Jordan G, Mauceli E, et al. 2011. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478: 476–482. doi:10.1038/nature10530
Liu X, Li YI, Pritchard JK. 2019. Trans effects on gene expression can drive omnigenic inheritance. Cell 177: 1022–1034.e6. doi:10.1016/j.cell.2019.04.014
Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550. doi:10.1186/s13059-014-0550-8
Luquette LJ, Miller MB, Zhou Z, Bohrson CL, Zhao Y, Jin H, Gulhan D, Ganz J, Bizzotto S, Kirkham S, et al. 2022. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nat Genet 54: 1564–1571. doi:10.1038/s41588-022-01180-2
Markljung E, Jiang L, Jaffe JD, Mikkelsen TS, Wallerman O, Larhammar M, Zhang X, Wang L, Saenz-Vash V, Gnirke A, et al. 2009. ZBED6, a novel transcription factor derived from a domesticated DNA transposon regulates IGF2 expression and muscle growth. PLoS Biol 7: e1000256. doi:10.1371/journal.pbio.1000256
Meuleman W, Muratov A, Rynes E, Halow J, Lee K, Bates D, Diegel M, Dunn D, Neri F, Teodosiadis A, et al. 2020. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584: 244–251. doi:10.1038/s41586-020-2559-3
Ming H, Sun J, Pasquariello R, Gatenby L, Herrick JR, Yuan Y, Pinto CR, Bondioli KR, Krisher RL, Jiang Z. 2021. The landscape of accessible chromatin in bovine oocytes and early embryos. Epigenetics 16: 300–312. doi:10.1080/15592294.2020.1795602
Monroe JG, Srikant T, Carbonell-Bejerano P, Becker C, Lensink M, Exposito-Alonso M, Klein M, Hildebrandt J, Neumann M, Kliebenstein D, et al. 2022. Mutation bias reflects natural selection in Arabidopsis thaliana. Nature 602: 101–105. doi:10.1038/s41586-021-04269-6
Nielsen R, Slatkin M. 2013. An introduction to population genetics: theory and applications. Oxford University Press, Oxford, New York.
Oget-Ebrad C, Kadri NK, Moreira GCM, Karim L, Coppieters W, Georges M, Druet T. 2022. Benchmarking phasing software with a whole-genome sequenced cattle pedigree. BMC Genomics 23: 130. doi:10.1186/s12864-022-08354-6
Pertea M, Pertea GM, Antonescu CM, Chang T-C, Mendell JT, Salzberg SL. 2015. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol 33: 290–295. doi:10.1038/nbt.3122
Poplin R, Ruano-Rubio V, DePristo MA, Fennell TJ, Carneiro MO, Van der Auwera GA, Kling DE, Gauthier LD, Levy-Moonshine A, Roazen D, et al. 2018. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv doi:10.1101/201178
Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, Manke T. 2016. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44: W160–W165. doi:10.1093/nar/gkw257
R Core Team. 2023. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. https://www.R-project.org/.
Reijns MAM, Kemp H, Ding J, de Procé SM, Jackson AP, Taylor MS. 2015. Lagging-strand replication shapes the mutational landscape of the genome. Nature 518: 502–506. doi:10.1038/nature14183
Sabarinathan R, Mularoni L, Deu-Pons J, Gonzales-Perez A, López-Bigas N. 2016. Nucleotide excision repair is impaired by binding of transcription factors to DNA. Nature 532: 264–267. doi:10.1038/nature17661
Smit M, Segers K, Carrascosa LG, Shay T, Baraldi F, Gyapay G, Snowder G, Georges M, Cockett N, Charlier C. 2003. Mosaicism of solid gold supports the causality of a noncoding A-to-G transition in the determinism of the callipyge phenotype. Genetics 163: 453–456. doi:10.1093/genetics/163.1.453
Stegle O, Parts L, Durbin R, Winn J. 2010. A Bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eQTL studies. PLoS Comput Biol 6: e1000770. doi:10.1371/journal.pcbi.1000770
Storey JD, Tibshirani R. 2003. Statistical significance for genomewide studies. Proc Natl Acad Sci 100: 9440–9445. doi:10.1073/pnas.1530509100
Trynka G, Westra H-J, Slowikowski K, Hu X, Xu H, Stranger BE, Klein RJ, Han B, Raychaudhuri S. 2015. Disentangling the effects of colocalizing genomic annotations to functionally prioritize non-coding variants within complex-trait loci. Am J Hum Genet 97: 139–152. doi:10.1016/j.ajhg .2015.05.016
Van Laere A-S, Nguyen M, Braunschweig M, Nezer C, Collette C, Moreau L, Archibald AL, Haley CS, Buys N, Tally M, et al. 2003. A regulatory mutation in IGF2 causes a major QTL effect on muscle growth in the pig. Nature 425: 832–836. doi:10.1038/nature02064
Wathes DC, Becker F, Buggiotti L, Crowe MA, Ferris C, Foldager L, Grelet C, Hostens M, Ingvartsen KL, Marchitelli C, et al. 2021a. Associations between circulating IGF-1 concentrations, disease status and the leukocyte transcriptome in early lactation dairy cows. Ruminants 1: 147–177. doi:10.3390/ruminants1020012
Wathes DC, Cheng Z, Salavati M, Buggiotti L, Takeda H, Tang L, Becker F, Ingvartsen KI, Ferris C, Hostens M, et al. 2021b. Relationships between metabolic profiles and gene expression in liver and leukocytes of dairy cows in early lactation. J Dairy Sci 104: 3596–3616. doi:10.3168/jds.2020-19165
Xiang R, van den Berg I, MacLeod IM, Hayes BJ, Prowse-Wilkins CP, Wang M, Bolormaa S, Liu Z, Rochfort SJ, Reich CM, et al. 2019. Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits. Proc Natl Acad Sci 116: 19398–19408. doi:10.1073/pnas.1904159116
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, Brown M, Li W, et al. 2008. Model-based Analysis of ChIP-seq (MACS). Genome Biol 9: R137. doi:10.1186/gb-2008-9-9-r137