Phylogenetic Analysis of SARS-CoV-2 Data Is Difficult

Author:

Morel Benoit1,Barbera Pierre1,Czech Lucas2ORCID,Bettisworth Ben1,Hübner Lukas13,Lutteropp Sarah1,Serdari Dora1,Kostaki Evangelia-Georgia4,Mamais Ioannis5,Kozlov Alexey M1,Pavlidis Pavlos6,Paraskevis Dimitrios4,Stamatakis Alexandros13ORCID

Affiliation:

1. Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany

2. Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, USA

3. Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany

4. Department of Hygiene Epidemiology and Medical Statistics, School of Medicine, National and Kapodistrian University of Athens, Athens, Greece

5. Department of Health Sciences, European University Cyprus, Nicosia, Cyprus

6. Institute of Computer Science, Foundation for Research and Technology-Hellas, Crete, Greece

Abstract

Abstract Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising a quality-filtered subset of 8,736 out of all 16,453 virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be credible. Finally, an automatic classification of the current sequences into subclasses using the mPTP tool for molecular species delimitation is also, as might be expected, not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.

Funder

Klaus Tschira Foundation

Publisher

Oxford University Press (OUP)

Subject

Genetics,Molecular Biology,Ecology, Evolution, Behavior and Systematics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3