Mash Screen: high-throughput sequence containment estimation for genome discovery-Reference-Cited by-同舟云学术

Mash Screen: high-throughput sequence containment estimation for genome discovery

Published:2019-11-05 Issue:1 Volume:20 Page:
ISSN:1474-760X
Container-title:Genome Biology
language:en
Short-container-title:Genome Biol

Author:

Ondov Brian D.^ORCID,Starrett Gabriel J.,Sappington Anna,Kostic Aleksandra,Koren Sergey,Buck Christopher B.,Phillippy Adam M.

Abstract

Abstract The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here, we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome and demonstrate the identification of a novel polyomavirus species from a public metagenome.

Publisher

Springer Science and Business Media LLC

Link

http://link.springer.com/content/pdf/10.1186/s13059-019-1841-x.pdf

Reference41 articles.

1. O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B, Smith-White B, Ako-Adjei D, et al.Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 2015; 44(D1):733–45.

2. RefSeq growth statistics. https://www.ncbi.nlm.nih.gov/genbank/statistics/ . Accessed 27 Feb 2019.

3. GenBank and WGS Statistics. http://www.ncbi.nlm.nih.gov/genbank/ . Accessed 27 Feb 2019.

4. Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration. The sequence read archive. Nucleic Acids Res. 2011; 39(Database issue):19–21.

5. SRA database growth. https://www.ncbi.nlm.nih.gov/sra/docs/sragrowth/ . Accessed 27 Feb 2019.

Cited by 180 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Occurrence of Campylobacter, Listeria monocytogenes, and extended-spectrum beta-lactamase Escherichia coli in slaughterhouses before and after cleaning and disinfection;Food Microbiology;2025-01

2. A survey of k-mer methods and applications in bioinformatics;Computational and Structural Biotechnology Journal;2024-12

3. CRISPR-AMRtracker: A novel toolkit to monitor the antimicrobial resistance gene transfer in fecal microbiota;Drug Resistance Updates;2024-11

4. Metagenomic functional profiling: to sketch or not to sketch?;Bioinformatics;2024-09-01

5. Genome sequence of Bacillus pumilus RI06-95 isolated during a Microcystis bloom in lake Champlain, USA;Microbiology Resource Announcements;2024-08-30