QuasiSeq: profiling viral quasispecies via self-tuning spectral clustering with PacBio long sequencing reads

Author:

Jiao Xiaoli1,Imamichi Hiromi2,Sherman Brad T1,Nahar Rishub1,Dewar Robin L3,Lane H Clifford2,Imamichi Tomozumi1,Chang Weizhong1ORCID

Affiliation:

1. Laboratory of Human Retrovirology and Immunoinformatics, Frederick National Laboratory for Cancer Research , Frederick, MD 21702, USA

2. Laboratory of Immunoregulation, National Institute of Allergy and Infectious Diseases , Bethesda, MD 20892, USA

3. Virus Isolation and Serology Laboratory, Frederick National Laboratory for Cancer Research , Frederick, MD 21702, USA

Abstract

Abstract Motivation The existence of quasispecies in the viral population causes difficulties for disease prevention and treatment. High-throughput sequencing provides opportunity to determine rare quasispecies and long sequencing reads covering full genomes reduce quasispecies determination to a clustering problem. The challenge is high similarity of quasispecies and high error rate of long sequencing reads. Results We developed QuasiSeq using a novel signature-based self-tuning clustering method, SigClust, to profile viral mixtures with high accuracy and sensitivity. QuasiSeq can correctly identify quasispecies even using low-quality sequencing reads (accuracy <80%) and produce quasispecies sequences with high accuracy (≥99.55%). Using high-quality circular consensus sequencing reads, QuasiSeq can produce quasispecies sequences with 100% accuracy. QuasiSeq has higher sensitivity and specificity than similar published software. Moreover, the requirement of the computational resource can be controlled by the size of the signature, which makes it possible to handle big sequencing data for rare quasispecies discovery. Furthermore, parallel computation is implemented to process the clusters and further reduce the runtime. Finally, we developed a web interface for the QuasiSeq workflow with simple parameter settings based on the quality of sequencing data, making it easy to use for users without advanced data science skills. Availability and implementation QuasiSeq is open source and freely available at https://github.com/LHRI-Bioinformatics/QuasiSeq. The current release (v1.0.0) is archived and available at https://zenodo.org/badge/latestdoi/340494542. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Cancer Institute, National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference21 articles.

1. aBayesQR: a Bayesian method for reconstruction of viral populations characterized by low diversity;Ahn;J. Comput. Biol,2018

2. Opportunities and challenges in long-read sequencing data analysis;Amarasinghe;Genome Biol,2020

3. Long single-molecule reads can resolve the complexity of the influenza virus composed of rare, closely related mutant variants;Artyomenko;J. Comput. Biol,2017

4. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory;Chaisson;BMC Bioinformatics,2012

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3