[en] The RADseq technology allows researchers to efficiently develop thousands of polymorphic
loci across multiple individuals with little or no prior information on the genome.
However, many questions remain about the biases inherent to this technology.
Notably, sequence misalignments arising from paralogy may affect the development of
single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity.
We evaluated the impact of putative paralog loci on genetic diversity estimation during
the development of SNPs from a RADseq dataset for the nonmodel tree species Robinia
pseudoacacia L. We sequenced nine genotypes and analyzed the frequency of putative
paralogous RAD loci as a function of both the depth of coverage and the mismatch
threshold allowed between loci. Putative paralogy was detected in a very variable
number of loci, from 1% to more than 20%, with the depth of coverage having a major
influence on the result. Putative paralogy artificially increased the observed degree of
polymorphism and resulting estimates of diversity. The choice of the depth of coverage
also affected diversity estimation and SNP validation: A low threshold decreased
the chances of detecting minor alleles while a high threshold increased allelic dropout.
SNP validation was better for the low threshold (4×) than for the high threshold (18×)
we tested. Using the strategy developed here, we were able to validate more than 80%
of the SNPs tested by means of individual genotyping, resulting in a readily usable set
of 330 SNPs, suitable for use in population genetics applications.
Agriculture & agronomy
Genetics & genetic processes
Phytobiology (plant sciences, forestry, mycology...)