TY - JOUR
T1 - Validation strategy of a bioinformatics whole genome sequencing workflow for Shiga toxin-producing Escherichia coli using a reference collection extensively characterized with conventional methods
AU - Bogaerts, Bert
AU - Nouws, Stéphanie
AU - Verhaegen, Bavo
AU - Denayer, Sarah
AU - Van Braekel, Julien
AU - Winand, Raf
AU - Fu, Qiang
AU - Crombé, Florence
AU - Piérard, Denis
AU - Marchal, Kathleen
AU - Roosens, Nancy H C
AU - De Keersmaecker, Sigrid C J
AU - Vanneste, Kevin
PY - 2021/3
Y1 - 2021/3
N2 - Whole genome sequencing (WGS) enables complete characterization of bacterial pathogenic isolates at single nucleotide resolution, making it the ultimate tool for routine surveillance and outbreak investigation. The lack of standardization, and the variation regarding bioinformatics workflows and parameters, however, complicates interoperability among (inter)national laboratories. We present a validation strategy applied to a bioinformatics workflow for Illumina data that performs complete characterization of Shiga toxin-producing Escherichia coli (STEC) isolates including antimicrobial resistance prediction, virulence gene detection, serotype prediction, plasmid replicon detection and sequence typing. The workflow supports three commonly used bioinformatics approaches for the detection of genes and alleles: alignment with blast+, kmer-based read mapping with KMA, and direct read mapping with SRST2. A collection of 131 STEC isolates collected from food and human sources, extensively characterized with conventional molecular methods, was used as a validation dataset. Using a validation strategy specifically adopted to WGS, we demonstrated high performance with repeatability, reproducibility, accuracy, precision, sensitivity and specificity above 95 % for the majority of all assays. The WGS workflow is publicly available as a 'push-button' pipeline at https://galaxy.sciensano.be. Our validation strategy and accompanying reference dataset consisting of both conventional and WGS data can be used for characterizing the performance of various bioinformatics workflows and assays, facilitating interoperability between laboratories with different WGS and bioinformatics set-ups.
AB - Whole genome sequencing (WGS) enables complete characterization of bacterial pathogenic isolates at single nucleotide resolution, making it the ultimate tool for routine surveillance and outbreak investigation. The lack of standardization, and the variation regarding bioinformatics workflows and parameters, however, complicates interoperability among (inter)national laboratories. We present a validation strategy applied to a bioinformatics workflow for Illumina data that performs complete characterization of Shiga toxin-producing Escherichia coli (STEC) isolates including antimicrobial resistance prediction, virulence gene detection, serotype prediction, plasmid replicon detection and sequence typing. The workflow supports three commonly used bioinformatics approaches for the detection of genes and alleles: alignment with blast+, kmer-based read mapping with KMA, and direct read mapping with SRST2. A collection of 131 STEC isolates collected from food and human sources, extensively characterized with conventional molecular methods, was used as a validation dataset. Using a validation strategy specifically adopted to WGS, we demonstrated high performance with repeatability, reproducibility, accuracy, precision, sensitivity and specificity above 95 % for the majority of all assays. The WGS workflow is publicly available as a 'push-button' pipeline at https://galaxy.sciensano.be. Our validation strategy and accompanying reference dataset consisting of both conventional and WGS data can be used for characterizing the performance of various bioinformatics workflows and assays, facilitating interoperability between laboratories with different WGS and bioinformatics set-ups.
KW - Escherichia coli
KW - STEC
KW - foodborne pathogens
KW - public health
KW - validation
KW - whole genome sequencing
UR - http://www.scopus.com/inward/record.url?scp=85103593315&partnerID=8YFLogxK
U2 - 10.1099/mgen.0.000531
DO - 10.1099/mgen.0.000531
M3 - Article
C2 - 33656437
VL - 7
JO - Microbial Genomics
JF - Microbial Genomics
SN - 2057-5858
IS - 3
M1 - 000531
ER -