Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions.

Druet, Tom; Macleod, I. M.; Hayes, B. J.

doi:10.1038/hdy.2013.13

Article (Scientific journals)

Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions.

Druet, Tom; Macleod, I. M.; Hayes, B. J.

2014 • In Heredity, 112 (1), p. 39-47

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/2268/146117

DOI
10.1038/hdy.2013.13

PubMed
23549338

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

druet_heredity2013.pdf

Author preprint (1.34 MB)

Request a copy

All documents in ORBi are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] Genomic prediction from whole-genome sequence data is attractive, as the accuracy of genomic prediction is no longer bounded by extent of linkage disequilibrium between DNA markers and causal mutations affecting the trait, given the causal mutations are in the data set. A cost-effective strategy could be to sequence a small proportion of the population, and impute sequence data to the rest of the reference population. Here, we describe strategies for selecting individuals for sequencing, based on either pedigree relationships or haplotype diversity. Performance of these strategies (number of variants detected and accuracy of imputation) were evaluated in sequence data simulated through a real Belgian Blue cattle pedigree. A strategy (AHAP), which selected a subset of individuals for sequencing that maximized the number of unique haplotypes (from single-nucleotide polymorphism panel data) sequenced gave good performance across a range of variant minor allele frequencies. We then investigated the optimum number of individuals to sequence by fold coverage given a maximum total sequencing effort. At 600 total fold coverage (x 600), the optimum strategy was to sequence 75 individuals at eightfold coverage. Finally, we investigated the accuracy of genomic predictions that could be achieved. The advantage of using imputed sequence data compared with dense SNP array genotypes was highly dependent on the allele frequency spectrum of the causative mutations affecting the trait. When this followed a neutral distribution, the advantage of the imputed sequence data was small; however, when the causal mutations all had low minor allele frequencies, using the sequence data improved the accuracy of genomic prediction by up to 30%.Heredity advance online publication, 3 April 2013; doi:10.1038/hdy.2013.13.

Disciplines :

Genetics & genetic processes
Agriculture & agronomy

Author, co-author :

Druet, Tom ; Université de Liège - ULiège > Département de productions animales > GIGA-R : Génomique animale

Macleod, I. M.

Hayes, B. J.

Language :

English

Title :

Toward genomic prediction from whole-genome sequence data: impact of sequencing design on genotype imputation and accuracy of predictions.

Publication date :

January 2014

Journal title :

Heredity

ISSN :

0018-067X

eISSN :

1365-2540

Publisher :

Nature, London, United Kingdom

Volume :

112

Issue :

Pages :

39-47

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBi :

since 04 April 2013

Statistics

Number of views

133 (17 by ULiège)

Number of downloads

6 (6 by ULiège)

More statistics

Scopus citations^®

174

Scopus citations^®
without self-citations

152

OpenCitations

143

OpenAlex citations

220

Bibliography

Druet T, Georges M (2010). A hidden markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping. Genetics 184: 789-798.
Druet T, Schrooten C, De Roos AP (2010). Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle. J Dairy Sci 93: 5443-5454.
Erbe M, Hayes BJ, Matukumalli LK, Goswami S, Bowman PJ, Reich CM et al. (2012). Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci 95: 4114-4129.
Goddard M (2009). Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136: 245-257.
Habier D, Tetens J, Seefried FR, Lichtner P, Thaller G (2010). The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol 42: 5.
Haile-Mariam M, Nieuwhof GJ, Beard KT, Konstatinov KV, Hayes BJ (2012). Comparison of heritabilities of dairy traits in Australian Holstein-Friesian cattle from genomic and pedigree data and implications for genomic evaluations. J AnimBreed Genet 130: 20-31.
Hayes BJ, Bowman PJ, Chamberlain AC, Verbyla K, Goddard ME (2009). Accuracy of genomic breeding values in multi-breed dairy cattle populations. Genet Sel Evol 41: 51.
Hayes B, Goddard ME (2008). Artificial selection method and reagents. Patent Application No. WO/2008/074101.
Heffner EL, Jannink J, Sorrells ME (2011). Genomic selection accuracy using multifamily prediction models in a wheat breeding program. Plant Gen 4: 65-75.
Hudson RR (1985). The sampling distribution of linkage disequilibrium under an infinite allele model without selection. Genetics 109: 611-631.
Jensen J, Su G, Madsen P (2012). Partitioning additive genetic variance into genomic and remaining polygenic components for complex traits in dairy cattle. BMC Genet 13: 44.
Kemper KE, Emery DL, Bishop SC, Oddy H, Hayes BJ, Dominik S et al. (2011). The distribution of SNP marker effects for faecal worm egg count in sheep, and the feasibility of using these markers to predict genetic merit for resistance to worm infections. Genet Res 93: 203-2189.
Larkin DM, Daetwyler HD, Hernandez AG, Wright CL, Hetrick LA, Boucek L et al. (2012). Whole-genome resequencing of two elite sires for the detection of haplotypes under selection in dairy cattle. Proc Natl Acad Sci USA 109: 7693-7698.
Le SQ, Durbin R (2011). SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples. Genome Res 21: 952-960.
Li YC, Willer CJ, Ding J, Scheet P, Abecasis GR (2010). MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34: 816-834.
Li Y, Sidore C, Kang HM, Boehmke M, Abecasis GR (2011). Low-coverage sequencing: implications for design of complex trait association studies. Genome Res 21: 940-951.
MacEachern S, Hayes B, McEwan J, Goddard M (2009A). An examination of positive selection and changing effective population size in Angus and Holstein cattle populations (Bos taurus) using a high density SNP genotyping platform and the contribution of ancient polymorphism to genomic diversity in Domestic cattle. BMC Genomics 10: 181.
MacEachern S, McEwan J, McCulloch A, Mather A, Savin K, Goddard M (2009B). Molecular evolution of the Bovini tribe (Bovidae, Bovinae): is there evidence of rapid evolution or reduced selective constraint in Domestic cattle? BMC Genomics 10: 179.
Macleod IM, Larkin D, Lewin H, Hayes BJ, Goddard ME (2012A). Inferring demography from runs of homozygosity in whole genome sequence, with correction for sequence Errors. Mol Biol Evol Submitte.d.
Macleod IM, Hayes BJ, Goddard ME (2012B). The effect of demography and long term selection on the accuracy of genomic prediction. PLOS Genet Submitte.d.
Meuwissen T, Goddard ME (2010). Accurate prediction of genetic values for complex traits by whole-genome resequencing. Genetics 185: 623-631.
Muir WM, Wong GKS, Zhang Y, Wang J, Groenen MAM, Crooijmans RPMA et al. (2008). Genome-wide assessment of worldwide chicken SNP genetic diversity indicates significant absence of rare alleles in commercial breeds. Proc Natl Acad Sci US A 105: 17312-17317.
Muir WM (2007). Comparison of genomic and traditional BLUP-estimated breeding value accuracy and selection response under alternative trait and genomic parameters. J Animal Breed Genet 124: 342-355.
Ober U, Ayroles JF, Stone EA, Richards S, Zhu D, Gibbs RA et al. (2012). Using wholegenome sequence data to predict quantitative trait phenotypes in Drosophila melanogaster. PLoS Genet 8: e1002685.
Park JH, Gail MH, Weinberg CR, Carroll RJ, Chung CC, Wang Z et al. (2011). Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. Proc Natl Acad Sci USA 108: 18026-18031.
Stahl EA, Wegmann D, Trynka G, Gutierrez-Achury J, Do R, Voight BF et al. (2012). Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nat Genet 44: 483-489.
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR et al. (2010). Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42: 565-569.