SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine-learning method

Author:

de Bernardi Schneider Adriano12ORCID,Su Michelle3,Hinrichs Angie S1ORCID,Wang Jade3,Amin Helly3,Bell John4,Wadford Debra A4,O’Toole Áine5ORCID,Scher Emily5,Perry Marc D1,Turakhia Yatish6ORCID,De Maio Nicola7ORCID,Hughes Scott3,Corbett-Detig Russ12

Affiliation:

1. Genomics Institute, University of California Santa Cruz , Santa Cruz, CA 95064, USA

2. Department of Biomolecular Engineering, University of California Santa Cruz , Santa Cruz, CA 95064, USA

3. Department of Health and Mental Hygiene, New York City Public Health Laboratory , New York, NY 10016, USA

4. California Department of Public Health (CDPH), VRDL/COVIDNet , Richmond, CA 94804, USA

5. Institute of Evolutionary Biology, University of Edinburgh , Edinburgh EH9 3FL, UK

6. Department of Electrical and Computer Engineering, University of California San Diego , San Diego, CA 92093, USA

7. European Molecular Biology Laboratory, European Bioinformatics Institute , Hinxton CB10 1SD, UK

Abstract

Abstract With the rapid spread and evolution of SARS-CoV-2, the ability to monitor its transmission and distinguish among viral lineages is critical for pandemic response efforts. The most commonly used software for the lineage assignment of newly isolated SARS-CoV-2 genomes is pangolin, which offers two methods of assignment, pangoLEARN and pUShER. PangoLEARN rapidly assigns lineages using a machine-learning algorithm, while pUShER performs a phylogenetic placement to identify the lineage corresponding to a newly sequenced genome. In a preliminary study, we observed that pangoLEARN (decision tree model), while substantially faster than pUShER, offered less consistency across different versions of pangolin v3. Here, we expand upon this analysis to include v3 and v4 of pangolin, which moved the default algorithm for lineage assignment from pangoLEARN in v3 to pUShER in v4, and perform a thorough analysis confirming that pUShER is not only more stable across versions but also more accurate. Our findings suggest that future lineage assignment algorithms for various pathogens should consider the value of phylogenetic placement.

Funder

Centers for Disease Control and Prevention - Epidemiology and Laboratory Capacity (ELC) for Infectious Diseases

Centers for Disease Control and Prevention, Epidemiology and Laboratory Capacity for Infectious Diseases

Publisher

Oxford University Press (OUP)

Reference27 articles.

1. Nextclade: Clade Assignment, Mutation Calling and Quality Control for Viral Genomes;Aksamentov;Journal of Open Source Software,2021

2. Cov-lineages/scorpio: Serious Constellations of Reoccurring Phylogenetically-independent Origin,2023

3. Updated Phylogeny of Chikungunya Virus Suggests Lineage-specific Rna Architecture;de Bernardi Schneider;Viruses,2019

4. Maximum Likelihood Pandemic-scale Phylogenetics;De Maio;Nature Genetics.,2023

5. Rapid Whole-genome Sequencing for Surveillance of Salmonella Enterica Serovar Enteritidis;Den Bakker;Emerging Infectious Diseases,2014

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3