Author:
Liao Herui,Ji Yongxin,Sun Yanni
Abstract
ABSTRACTBecause bacterial strains can exhibit different biological properties, strain-level composition analysis plays a vital role in understanding the functions and dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Despite a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: a reference database with highly similar reference strain genomes and the presence of multiple strains under one species in a sample. In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mer indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We rigorously tested StrainScan on many simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and Strainest. The results show that StrainScan has higher accuracy and resolution than the the state-of-the-art tools on strain-level composition analysis. It improves the F1-score by 20% in identifying multiple strains with at least 99.89% average nucleotide identity. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/strainScan.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献