Phylogenetic analysis of SARS-CoV-2 data is difficult-Reference-Cited by-同舟云学术

Phylogenetic analysis of SARS-CoV-2 data is difficult

Published:2020-08-06 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Morel Benoit,Barbera Pierre,Czech Lucas,Bettisworth Ben^ORCID,Hübner Lukas,Lutteropp Sarah,Serdari Dora,Kostaki Evangelia-Georgia,Mamais Ioannis,Kozlov Alexey M^ORCID,Pavlidis Pavlos,Paraskevis Dimitrios,Stamatakis Alexandros

Abstract

Numerous studies covering some aspects of SARS-CoV-2 data analyses are being published on a daily basis, including a regularly updated phylogeny on nextstrain.org. Here, we review the difficulties of inferring reliable phylogenies by example of a data snapshot comprising all virus sequences available on May 5, 2020 from gisaid.org. We find that it is difficult to infer a reliable phylogeny on these data due to the large number of sequences in conjunction with the low number of mutations. We further find that rooting the inferred phylogeny with some degree of confidence either via the bat and pangolin outgroups or by applying novel computational methods on the ingroup phylogeny does not appear to be possible. Finally, an automatic classification of the current sequences into sub-classes based on statistical criteria is also not possible, as the sequences are too closely related. We conclude that, although the application of phylogenetic methods to disentangle the evolution and spread of COVID-19 provides some insight, results of phylogenetic analyses, in particular those conducted under the default settings of current phylogenetic inference tools, as well as downstream analyses on the inferred phylogenies, should be considered and interpreted with extreme caution.

Publisher

Cold Spring Harbor Laboratory

Reference45 articles.

1. Kangpeng Xiao , Junqiong Zhai , Yaoyu Feng , Niu Zhou , Xu Zhang , Jie-Jian Zou , Na Li , Yaqiong Guo , Xiaobing Li , Xuejuan Shen , et al. Isolation of sars-cov-2-related coronavirus from malayan pangolins. Nature, pages 1–4, 2020.

2. Nextstrain: real-time tracking of pathogen evolution;Bioinformatics,2018

3. Adam Brufsky . Distinct viral clades of sars-cov-2: Implications for modeling of viral spread. Journal of medical virology, 2020.

4. Andrew Rambaut , Edward C Holmes , Verity Hill , Aine OToole , John McCrone , Chris Ruis , Louis du Plessis , and Oliver Pybus . A dynamic nomenclature proposal for sars-cov-2 to assist genomic epidemiology. bioRxiv, 2020.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparison of SARS-CoV-2 sequencing using the ONT GridION and the Illumina MiSeq;BMC Genomics;2022-04-22

2. A Computational Framework for Pattern Detection on Unaligned Sequences: An Application on SARS-CoV-2 Data;Frontiers in Genetics;2021-05-28

3. The first wave of the Spanish COVID-19 epidemic was associated with early introductions and fast spread of a dominating genetic variant;2020-12-22

4. No evidence for increased transmissibility from recurrent mutations in SARS-CoV-2;Nature Communications;2020-11-25

5. Authors' Reply to: Errors in Tracing Coronavirus SARS-CoV-2 Transmission Using a Maximum Likelihood Tree. Comment on “A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis”;JMIR Public Health and Surveillance;2020-11-11