Affiliation:
1. National Biodefense Analysis and Countermeasures Center, Fort Detrick, MD, USA
2. National Human Genome Research Institute, Bethesda, MD, USA
Abstract
When performing bioforensic casework, it is important to be able to reliably detect the presence of a particular organism in a metagenomic sample, even if the organism is only present in a trace amount. For this task, it is common to use a sequence classification program that determines the taxonomic affiliation of individual sequence reads by comparing them to reference database sequences. As metagenomic data sets often consist of millions or billions of reads that need to be compared to reference databases containing millions of sequences, such sequence classification programs typically use search heuristics and databases with reduced sequence diversity to speed up the analysis, which can lead to incorrect assignments. Thus, in a bioforensic setting where correct assignments are paramount, assignments of interest made by “first-pass” classifiers should be confirmed using the most precise methods and comprehensive databases available. In this study we present a BLAST-based method for validating the assignments made by less precise sequence classification programs, with optimal parameters for filtering of BLAST results determined via simulation of sequence reads from genomes of interest, and we apply the method to the detection of four pathogenic organisms. The software implementing the method is open source and freely available.
Funder
Department of Homeland Security (DHS) Science and Technology Directorate (S&T)
National Biodefense Analysis and Countermeasures Center (NBACC)
Subject
General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience
Reference36 articles.
1. Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis;Achtman;Proceedings of the National Academy of Sciences of the United States of America,1999
2. Geospatial resolution of human and bacterial diversity with city-scale metagenomics;Afshinnekoo;Cell Systems,2015
3. The construction and use of log-odds substitution scores for multiple sequence alignment;Altschul;PLOS Computational Biology,2010
4. Scalable metagenomic taxonomy classification using a reference genome database;Ames;Bioinformatics,2013
5. SILVA, RDP, Greengenes, NCBI and OTT—how do these taxonomies compare?;Balvočiūtė;BMC Genomics,2017
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献