Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications-Reference-Cited by-同舟云学术

Benchmarking taxonomic classifiers with Illumina and Nanopore sequence data for clinical metagenomic diagnostic applications

Published:2022-01-13 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Govender Kumeren N.^ORCID,Eyre David W.

Abstract

AbstractCulture-independent metagenomic detection of microbial species has the potential to provide rapid and precise real-time diagnostic results. However, it is potentially limited by sequencing and classification errors. We use simulated and real-world data to benchmark rates of species misclassification using 100 reference genomes for each of ten common bloodstream pathogens and six frequent blood culture contaminants (n=1600). Simulating both with and without sequencing error for both the Illumina and Oxford Nanopore platforms, we evaluated commonly used classification tools including Kraken2, Bracken, and Centrifuge, utilising mini (8GB) and standard (30-50GB) databases. Bracken with the standard database performed best, the median percentage of reads across both sequencing platforms identified correctly to the species level was 98.46% (IQR 93.0:99.3) [range 57.1:100]. For Kraken2 with a mini database, a commonly used combination, median species-level identification was 79.3% (IQR 39.1:88.8) [range 11.2:100]. Classification performance varied by species, with E. coli being more challenging to classify correctly (59.4% to 96.4% reads with correct species, varying by tool used). By filtering out shorter Nanopore reads (<3500bp) we found performance similar or superior to Illumina sequencing, despite higher sequencing error rates. Misclassification was more common when the misclassified species had a higher average nucleotide identity to the true species. Our findings highlight taxonomic misclassification of sequencing data occurs and varies by sequencing and analysis workflow. This “bioinformatic contamination” should be accounted for in metagenomic pipelines to ensure accurate results that can support clinical decision making.ImportanceMetagenomics may transform clinical microbiology by enabling more rapid species detection in a potentially unbiased manner and reducing reliance on culture-based approaches. However, it is still limited by ongoing challenges such as sequencing and classification software errors. In this study, we use simulated and real-world data to define the intrinsic rates of species misclassification that occur using Illumina and Oxford Nanopore sequencing platforms with commonly used taxonomic classification tools and databases. We quantify the extent of “bioinformatic contamination” arising from the classification process. This enables us to identify the best performing tools that maximize classification accuracy, and to suggest how taxonomic misclassification can be formally accounted for in clinical diagnostic workflows. Specifically, we specify thresholds for identifying or excluding polymicrobial infections in metagenomic samples, based on rates of misclassification of similar species, which might have clinical implications when treating infection.

Publisher

Cold Spring Harbor Laboratory

Reference36 articles.

1. Govender KN , Street TL , Sanderson ND , Eyre DW . 2021. Metagenomic sequencing as a pathogen-agnostic clinical diagnostic tool for infectious diseases: a systematic review and meta-analysis of diagnostic test accuracy studies. J Clin Microbiol JCM-02916.

2. Dynamic regulation of genome-wide pre-mRNA splicing and stress tolerance by the Sm-like protein LSm5 in Arabidopsis

3. Performance of neural network basecalling tools for Oxford Nanopore sequencing

4. Bracken: estimating species abundance in metagenomics data

5. GATK PathSeq: a customizable computational tool for the discovery and identification of microbial sequences in libraries from eukaryotic hosts;Bioinformatics,2018

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sensitivity and consistency of long- and short-read metagenomics and epicPCR for the detection of antibiotic resistance genes and their bacterial hosts in wastewater;Journal of Hazardous Materials;2024-05

2. Sensitivity and consistency of long- and short-read metagenomics and epicPCR for the detection of antibiotic resistance genes and their bacterial hosts in wastewater;2023-08-13