black locust; depth of coverage; putative paralogy filtering; restriction site-associated DNA sequencing
Abstract :
[en] The RADseq technology allows researchers to efficiently develop thousands of polymorphic
loci across multiple individuals with little or no prior information on the genome.
However, many questions remain about the biases inherent to this technology.
Notably, sequence misalignments arising from paralogy may affect the development of
single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity.
We evaluated the impact of putative paralog loci on genetic diversity estimation during
the development of SNPs from a RADseq dataset for the nonmodel tree species Robinia
pseudoacacia L. We sequenced nine genotypes and analyzed the frequency of putative
paralogous RAD loci as a function of both the depth of coverage and the mismatch
threshold allowed between loci. Putative paralogy was detected in a very variable
number of loci, from 1% to more than 20%, with the depth of coverage having a major
influence on the result. Putative paralogy artificially increased the observed degree of
polymorphism and resulting estimates of diversity. The choice of the depth of coverage
also affected diversity estimation and SNP validation: A low threshold decreased
the chances of detecting minor alleles while a high threshold increased allelic dropout.
SNP validation was better for the low threshold (4×) than for the high threshold (18×)
we tested. Using the strategy developed here, we were able to validate more than 80%
of the SNPs tested by means of individual genotyping, resulting in a readily usable set
of 330 SNPs, suitable for use in population genetics applications.
Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402.
Andrews, S. (2015). FastQC a quality control tool for high throughput sequence data. Available online: http://www.bioinformatics.babraham.ac.uk/projects/fastqc
Andrews, K. R., Good, J. M., Miller, M. R., Luikart, G., & Hohenlohe, P. A. (2016). Harnessing the power of RADseq for ecological and evolutionary genomics. Nature Reviews Genetics, 17, 81–92.
Arnold, B., Corbett-Detig, R. B., Hartl, D., & Bomblies, K. (2013). RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Molecular Ecology, 22, 3179–3190.
Barrett, R. P., Mebrathu, T., & Hanover, J. W. (1990). Black locust: A multi-purpose tree species for temperate climates. In J. Janick, & J. E. Simon (Eds.), Advances in new crops (pp. 278–283). Portland, OR: Timber Press.
Bianco, L., Cestaro, A., Sargent, D. J., Banchi, E., Derdak, S., Di Guardo, M., … Troggio, M. (2014). Development and validation of a 20K single nucleotide polymorphism (SNP) whole genome genotyping array for apple (Malus x domestica Borkh). PLoS One, 9, e110377.
Boehm, J. T., Waldman, J., Robinson, J. D., & Hickerson, M. J. (2015). Population genomics reveals seahorses (Hippocampus erectus) of the Western Mid-Atlantic coast to be residents rather than vagrants. PLoS One, 10, e0116219.
Brockman, W., Alvarez, P., Young, S., Garber, M., Giannoukos, G., Lee, W. L., … Jaffe, D. B. (2008). Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Research, 18, 763–770.
Bryc, K., Patterson, N., & Reich, D. (2013). A novel approach to estimating heterozygosity from low-coverage genome sequence. Genetics, 195, 553–561.
Cannon, S. B., McKain, M. R., Harkess, A., Nelson, M. N., Dash, S., Deyholos, M. K., … Leebens-Mack, J. (2014). Multiple polyploidy events in the early radiation of nodulating and nonnodulating legumes. Molecular Biology and Evolution, 32, 18.
Catchen, J. M., Amores, A., Hohenlohe, P., Cresko, W., & Postletwait, J. H. (2011). Stacks: Building and genotyping loci de novo from short-read sequences. G3: Genes, Genomes, Genetics, 1, 171–182.
Cierjacks, A., Kowarik, I., Joshi, J., Hempel, S., Ristow, M., von der Lippe, M., & Weber, E. (2013). Biological flora of the British Isles: Robinia pseudoacacia. Journal of Ecology, 101, 1623–1640.
Davey, J. W., Cezard, T., Fuentes-Utrilla, P., Eland, C., Gharbi, K., & Blaxter, M. L. (2013). Special features of RAD sequencing data: Implications for genotyping. Molecular Ecology, 22, 3151–3164.
Davey, J. W., Hohenlohe, P. A., Etter, P. D., Boone, J. Q., Catchen, J. M., & Blaxter, M. L. (2011). Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics, 12, 499–510.
Etter, P. D., Bassham, S., Hohenlohe, P. A., Johnson, E. A., & Cresko, W. A. (2011). SNP discovery and genotyping for evolutionary genetics using RAD sequencing. Molecular Methods for Evolutionary Genetics, 772, 157–178.
Gautier, M., Gharbi, K., Cezard, T., Foucaud, J., Kerdelhué, C., Pudlo, P., … Estoup, A. (2013). The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Molecular Ecology, 22, 3165–3178.
Gayral, P., Melo-Ferreira, J., Glémin, S., Bierne, N., Carneiro, M., Nabholz, B., … Galtier, N. (2013). Reference-free population genomics from next-generation transcriptome data and the vertebrate-invertebrate gap. Plos Genetics, 9, e1003457.
Haubold, B., Pfaffelhuber, P., & Lynch, M. (2010). mlRho—A program for estimating the population mutation and recombination rates from shotgun-sequenced diploid genomes. Molecular Ecology, 19, 277–284.
Hohenlohe, P. A., Day, M. D., Amish, S. J., Miller, M. R., Kamps-Hughes, N., Boyer, M. C., … Luikart, G. (2013). Genomic patterns of introgression in rainbow and westslope cutthroat trout illuminated by overlapping paired-end RAD sequencing. Molecular Ecology, 22, 3002–3013.
Ilut, D. C., Nydam, M. L., & Hare, M. P. (2014). Defining loci in restriction-based reduced representation genomic data from nonmodel species: Sources of bias and diagnostics for optimal clustering. BioMed Research International, 2014, 1–9 ID 675158.
Karam, M.-J., Lefevre, F., Dagher-Kharrat, M. B., Pinosio, S., & Vendramin, G. G. (2015). Genomic exploration and molecular marker development in a large and complex conifer genome using RADseq and mRNAseq. Molecular Ecology Resources, 15, 601–612.
Kennedy, J. M. (1983). Geographic variation in black locust (Robinia pseudoacacia L.). MS Thesis, Athens: University of Georgia.
Kim, S. Y., Lohmueller, K. E., Albrechtsen, A., Li, Y., Korneliussen, T., Tian, G., … Nielsen, R. (2011). Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics, 12, 231.
Lawrence, T. J., Kauffman, K. T., Amrine, K. C. H., Carper, D. L., Lee, R. S., Becich, P. J., … Ardell, D. H. (2015). FAST: FAST analysis of sequences toolbox. Frontiers in Genetics, 6, 172.
Lexer, C., Wuest, R. O., Mangili, S., Heuertz, M., Stölting, K. N., Pearman, P. B., … Bossolini, E. (2014). Genomics of the divergence continuum in an African plant biodiversity hotspot, I: Drivers of population divergence in Restio capensis (Restionaceae). Molecular Ecology, 23, 4373–4386.
Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25, 1754–1760.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … Durbin, R. (2009). The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079.
Li, G., Xu, G., Guo, X., & Du, S. (2014). Mapping the global potential geographical distribution of black locust (Robinia pseudoacacia L.) using herbarium data and a maximum entropy model. Forests, 5, 2773–2792.
Lian, C., & Hogetsu, T. (2002). Development of microsatellite markers in black locust (Robinia pseudoacacia) using a dual-suppression PCR technique. Molecular Ecology Notes, 2, 211–213.
Lynch, M. (2008). Estimation of nucleotide diversity, disequilibrium coefficients, and mutation rates from high-coverage genome-sequencing projects. Molecular Biology and Evolution, 25, 2409–2419.
Mariette, S., Wong Jun Tai, F., Roch, G., Barre, A., Chague, A., Decroocq, S., … Decroocq, V. (2016). Genome-wide association links candidate genes to resistance to Plum Pox Virus in apricot (Prunus armeniaca). New Phytologist, 209, 773–784.
Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. Journal, 17, 10–12.
Mastretta-Yanes, A., Arrigo, N., Alvarez, N., Jorgensen, T. H., Piñero, D., & Emerson, B. C. (2015). Restriction site-associated DNA sequencing, genotyping error estimation and de novo assembly optimization for population genetic inference. Molecular Ecology Resources, 15, 28–41.
Mishima, K., Hirao, T., Urano, S., Watanabe, A., & Takata, K. (2009). Isolation and characterization of microsatellite markers from Robinia pseudoacacia L. Molecular Ecology Resources, 9, 850–852.
Nei, M. (1987). Molecular evolutionary genetics (512 pp.). New York, NY: Columbia University Press.
Nielsen, R., Korneliussen, T., Albrechtsen, A., Li, Y., & Wang, J. (2012). SNP calling, genotype calling, and sample allele frequency estimation from new-generation sequencing data. PLoS One, 7, e37558.
Olszewska, M. J., & Osiecka, R. (1984). Relationship between 2C DNA content, systematic position and level of DNA endoreplication during differentiation of root parenchyma in dicot shrubs and trees—Comparison with herbaceous species. Biochemie und Physiologie der Pflanzen, 179, 641–657.
Pegadaraju, V., Nipper, R., Hulke, B., Qi, L., & Schultz, Q. (2013). De novo sequencing of sunflower genome for SNP discovery using RAD (Restriction site Associated DNA) approach. BMC Genomics, 14, 556.
Peterson, B. K., Weber, J. N., Kay, E. H., Fisher, H. S., & Hoekstra, H. E. (2012). Double digest RADseq: An inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One, 7, e37135.
R Core Team (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for statistical computing.
Richardson, D. M., & Rejmanek, M. (2011). Trees and shrubs as invasive alien species—A global review. Diversity and Distributions, 17, 788–809.
Roesti, M., Salzburger, W., & Berner, D. (2012). Uninformative polymorphisms bias genome scans for signatures of selection. BMC Evolutionary Biology, 12, 94.
Schatz, M. C., Delcher, A. L., & Salzberg, S. L. (2010). Assembly of large genomes using second-generation sequencing. Genome Research, 20, 1165–1173.
Schuelke, M. (2000). An economic method for the fluorescent labeling of PCR fragments. Nature Biotechnology, 18, 233–234.
Shendure, J., & Ji, H. (2008). Next-generation DNA sequencing. Nature Biotechnology, 26, 1135–1145.
Sims, D., Sudbery, I., Ilott, N. E., Heger, A., & Ponting, C. P. (2014). Sequencing depth and coverage: Key considerations in genomic analyses. Nature Reviews Genetics, 15, 121–132.
Soltis, D. E., Albert, V. A., Leebens-Mack, J., Bell, C. D., Paterson, A. H., Zheng, C., … Soltis, P. S. (2009). Polyploidy and angiosperm diversification. American Journal of Botany, 96, 336–348.
Sun, R., Chang, Y., Yang, F., Wang, Y., Li, H., Zhao, Y., … Han, Z. (2015). A dense SNP genetic map constructed using restriction site-associated DNA sequencing enables detection of QTLs controlling apple fruit quality. BMC Genomics, 16, 747.
Waples, R. K., Seeb, L. W., & Seeb, J. E. (2016). Linkage mapping with paralogs exposes regions of residual tetrasomic inheritance in chum salmon (Oncorhynchus keta). Molecular Ecology Resources, 16, 17–28.
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.