Optimized quantification of intra-host viral diversity in SARS-CoV-2 and influenza virus sequence data

Author:

Roder A. E.1,Johnson K. E. E.12ORCID,Knoll M.2,Khalfan M.2,Wang B.2,Schultz-Cherry S.3ORCID,Banakis S.1,Kreitman A.1,Mederos C.1,Youn J.-H.4,Mercado R.4,Wang W.1,Chung M.1,Ruchnewitz D.5ORCID,Samanovic M. I.6,Mulligan M. J.6,Lässig M.5,Luksza M.7,Das S.4ORCID,Gresham D.2ORCID,Ghedin E.12ORCID

Affiliation:

1. Systems Genomics Section, Laboratory of Parasitic Diseases, DIR, NIAID, NIH , Bethesda, Maryland, USA

2. Department of Biology, Center for Genomics and Systems Biology, New York University , New York, New York, USA

3. Department of Infectious Diseases, St Jude Children Research Hospital , Memphis, Tennessee, USA

4. Department of Laboratory Medicine, NIH , Bethesda, Maryland, USA

5. Institute for Biological Physics, University of Cologne , Cologne, Germany

6. Department of Medicine, New York University Langone Vaccine Center , New York, New York, USA

7. Department of Oncological Sciences, Icahn School of Medicine at Mount Sinai , New York, New York, USA

Abstract

ABSTRACT High error rates of viral RNA-dependent RNA polymerases lead to diverse intra-host viral populations during infection. Errors made during replication that are not strongly deleterious to the virus can lead to the generation of minority variants. However, accurate detection of minority variants in viral sequence data is complicated by errors introduced during sample preparation and data analysis. We used synthetic RNA controls and simulated data to test seven variant-calling tools across a range of allele frequencies and simulated coverages. We show that choice of variant caller and use of replicate sequencing have the most significant impact on single-nucleotide variant (SNV) discovery and demonstrate how both allele frequency and coverage thresholds impact both false discovery and false-negative rates. When replicates are not available, using a combination of multiple callers with more stringent cutoffs is recommended. We use these parameters to find minority variants in sequencing data from SARS-CoV-2 clinical specimens and provide guidance for studies of intra-host viral diversity using either single replicate data or data from technical replicates. Our study provides a framework for rigorous assessment of technical factors that impact SNV identification in viral samples and establishes heuristics that will inform and improve future studies of intra-host variation, viral diversity, and viral evolution. IMPORTANCE When viruses replicate inside a host cell, the virus replication machinery makes mistakes. Over time, these mistakes create mutations that result in a diverse population of viruses inside the host. Mutations that are neither lethal to the virus nor strongly beneficial can lead to minority variants that are minor members of the virus population. However, preparing samples for sequencing can also introduce errors that resemble minority variants, resulting in the inclusion of false-positive data if not filtered correctly. In this study, we aimed to determine the best methods for identification and quantification of these minority variants by testing the performance of seven commonly used variant-calling tools. We used simulated and synthetic data to test their performance against a true set of variants and then used these studies to inform variant identification in data from SARS-CoV-2 clinical specimens. Together, analyses of our data provide extensive guidance for future studies of viral diversity and evolution.

Funder

Division of Intramural Research, National Institute of Allergy and Infectious Diseases

HHS | NIH | National Institute of Allergy and Infectious Diseases

HHS | NIH | National Institute of General Medical Sciences

Publisher

American Society for Microbiology

Subject

Virology,Microbiology

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3