2017 • In Kurmayer, Rainer; Sivonen, Kaarina; Wilmotte, Annicket al. (Eds.) Molecular Tools for the Detection and Quantification of Toxigenic Cyanobacteria
[en] Amplicon sequencing can be a very powerful approach for detecting toxic cyanobacteria
or any other kind of microorganism during monitoring programs. However, owing to
the huge size of next-generation sequencing (NGS) datasets (up to several Gb), there
is an obvious need for semi-automatic data processing and statistical analysis, as well
as visualization of the patterns found. Importantly, raw NGS data contain errors, some
of which are easily detected (e.g. too short or low-quality reads), while others remain
hidden even after the most stringent quality controls (e.g. chimeras, contaminations,
reads with large insertions or deletions, referred to as “indels”). As a consequence, NGS
data need to be interpreted with caution, and bioinformatics analysis implementing poor
error identification can easily lead to erroneous conclusions. Hence, a crucial step in
the analysis of NGS data is the detection and removal of as many erroneous reads as
possible. Moreover, bioinformatics involve additional preprocessing steps, including
demultiplexing (i.e. grouping reads to samples according to the barcode sequence), deleting
non-biological tags together with the adaptors and primer sequences, and removing
chimeric sequences. In addition, the bioinformatics pipelines enable the quality-filtered
sequences to be clustered into biologically relevant operational taxonomic units (OTUs),
which form the basis of the statistical analysis, including the calculation of alpha- and
beta-diversity.
F.R.S.-FNRS - Fonds de la Recherche Scientifique BELSPO - SPP Politique scientifique - Service Public Fédéral de Programmation Politique scientifique COST - European Cooperation in Science and Technology