Maximum likelihood pandemic-scale phylogenetics-Reference-Cited by-同舟云学术

Maximum likelihood pandemic-scale phylogenetics

Published:2022-03-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

De Maio Nicola^ORCID,Kalaghatgi Prabhav,Turakhia Yatish,Corbett-Detig Russell,Minh Bui Quang^ORCID,Goldman Nick

Abstract

SummaryPhylogenetics plays a crucial role in the interpretation of genomic data1. Phylogenetic analyses of SARS-CoV-2 genomes have allowed the detailed study of the virus’s origins2, of its international3,4 and local4–9 spread, and of the emergence10 and reproductive success11 of new variants, among many applications. These analyses have been enabled by the unparalleled volumes of genome sequence data generated and employed to study and help contain the pandemic12. However, preferred model-based phylogenetic approaches including maximum likelihood and Bayesian methods, mostly based on Felsenstein’s ‘pruning’ algorithm13,14, cannot scale to the size of the datasets from the current pandemic4,15, hampering our understanding of the virus’s evolution and transmission16. We present new approaches, based on reworking Felsenstein’s algorithm, for likelihood-based phylogenetic analysis of epidemiological genomic datasets at unprecedented scales. We exploit near-certainty regarding ancestral genomes, and the similarities between closely related and densely sampled genomes, to greatly reduce computational demands for memory and time. Combined with new methods for searching amongst candidate evolutionary trees, this results in our MAPLE (‘MAximum Parsimonious Likelihood Estimation’) software giving better results than popular approaches such as FastTree 217, IQ-TREE 218, RAxML-NG19 and UShER15. Our approach therefore allows complex and accurate proba-bilistic phylogenetic analyses of millions of microbial genomes, extending the reach of genomic epidemiology. Future epidemiological datasets are likely to be even larger than those currently associated with COVID-19, and other disciplines such as metagenomics and biodiversity science are also generating huge numbers of genome sequences20–22. Our methods will permit continued use of preferred likelihood-based phylogenetic analyses.

Publisher

Cold Spring Harbor Laboratory

Reference70 articles.

1. Molecular phylogenetics: principles and practice

2. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic;Nature Microbiology,2020

3. Genomic epidemiology reveals multiple introductions of SARS-CoV-2 from mainland Europe into Scotland;Nature Microbiology,2021

4. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK

5. Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Rate variation and recurrent sequence errors in pandemic-scale phylogenetics;2024-07-15

2. An AI Agent for Fully Automated Multi-omic Analyses;2023-09-12

3. SARS-CoV-2 lineage assignments using phylogenetic placement/UShER are superior to pangoLEARN machine learning method;2023-05-27

4. Fidelity of hyperbolic space for Bayesian phylogenetic inference;PLOS Computational Biology;2023-04-26

5. Data Integration in Bayesian Phylogenetics;Annual Review of Statistics and Its Application;2023-03-10