Abstract
AbstractThe fast growth of public repositories of sequences greatly contributes to the success of metagenomics applications. However, they are growing at a much faster pace than the resources to use them properly. This challenges current methods, which struggle to take full advantage of the massive and fast data generation. We propose a generational leap in performance and usability with ganon2, a sequence classification method that performs taxonomic binning and profiling for metagenomics analysis. It indexes large datasets with a small memory footprint, maintaining fast, sensitive, and precise classification results. This is possible with the Hierarchical Interleaved Bloom Filter data structure paired with minimizers and several other improvements and optimizations. Based on the full NCBI RefSeq and its sub-sets, ganon2 indices are on average 50% smaller than state-ofthe-art methods, providing a great compression rate for large and diverse genomic reference sets. Using 16 simulated samples from various studies, including the CAMI 1+2 challenge, ganon2 achieved up to 0.17 higher median F1-Score in taxonomic binning. In profiling, improvements in the F1-Score median are up to 0.32 keeping a balanced L1-norm error in the abundance estimation. ganon2 is one of the fastest tools evaluated and enables the use of larger, more diverse and up-to-date reference sets in daily microbiome analysis, improving the resolution of results. The code is open-source and available with documentation athttps://github.com/pirovc/ganon
Publisher
Cold Spring Harbor Laboratory
Reference28 articles.
1. Shotgun metagenomics, from sampling to analysis
2. Microbiology: the road to strain-level identification
3. GenBank and WGS Statistics. https://www.ncbi.nlm.nih.gov/genbank/statistics/ Accessed 2023-09-06
4. DNA Sequencing Costs: Data. https://www.genome.gov/about-genomics/fact-sheets/DNA-Sequencing-Costs-Data Accessed 2023-09-06
5. The international nucleotide sequence database collaboration
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献