Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty-Reference-Cited by-同舟云学术

Adaptive RAxML-NG: Accelerating Phylogenetic Inference under Maximum Likelihood using Dataset Difficulty

Published:2023-10 Issue:10 Volume:40 Page:
ISSN:0737-4038
Container-title:Molecular Biology and Evolution
language:en
Short-container-title:

Author:

Togkousidis Anastasis¹^ORCID,Kozlov Oleksiy M¹^ORCID,Haag Julia¹^ORCID,Höhler Dimitri¹^ORCID,Stamatakis Alexandros¹²³^ORCID

Affiliation:

1. Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies , 69118 Heidelberg , Germany

2. Institute of Theoretical Informatics, Karlsruhe Institute of Technology , 76128 Karlsruhe , Germany

3. Biodiversity Computing Group, Institute of Computer Science, Foundation for Research and Technology - Hellas, GR - 711 10 Heraklion, Crete, Greece

Abstract

Abstract Phylogenetic inferences under the maximum likelihood criterion deploy heuristic tree search strategies to explore the vast search space. Depending on the input dataset, searches from different starting trees might all converge to a single tree topology. Often, though, distinct searches infer multiple topologies with large log-likelihood score differences or yield topologically highly distinct, yet almost equally likely, trees. Recently, Haag et al. introduced an approach to quantify, and implemented machine learning methods to predict, the dataset difficulty with respect to phylogenetic inference. Easy multiple sequence alignments (MSAs) exhibit a single likelihood peak on their likelihood surface, associated with a single tree topology to which most, if not all, independent searches rapidly converge. As difficulty increases, multiple locally optimal likelihood peaks emerge, yet from highly distinct topologies. To make use of this information, we introduce and implement an adaptive tree search heuristic in RAxML-NG, which modifies the thoroughness of the tree search strategy as a function of the predicted difficulty. Our adaptive strategy is based upon three observations. First, on easy datasets, searches converge rapidly and can hence be terminated at an earlier stage. Second, overanalyzing difficult datasets is hopeless, and thus it suffices to quickly infer only one of the numerous almost equally likely topologies to reduce overall execution time. Third, more extensive searches are justified and required on datasets with intermediate difficulty. While the likelihood surface exhibits multiple locally optimal peaks in this case, a small proportion of them is significantly better. Our experimental results for the adaptive heuristic on 9,515 empirical and 5,000 simulated datasets with varying difficulty exhibit substantial speedups, especially on easy and difficult datasets (53% of total MSAs), where we observe average speedups of more than 10×. Further, approximately 94% of the inferred trees using the adaptive strategy are statistically indistinguishable from the trees inferred under the standard strategy (RAxML-NG).

Publisher

Oxford University Press (OUP)

Subject

Genetics,Molecular Biology,Ecology, Evolution, Behavior and Systematics

Link

https://academic.oup.com/mbe/advance-article-pdf/doi/10.1093/molbev/msad227/51927877/msad227.pdf

Reference31 articles.

1. Bayesian model adequacy and choice in phylogenetics;Bollback;Mol Biol Evol,2002

2. Evolutionary trees from DNA sequences: a maximum likelihood approach;Felsenstein;J Mol Evol,1981

3. Toward defining the course of evolution: minimum change for a specific tree topology;Fitch;Syst Biol,1971

4. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0;Guindon;Syst Biol,2010

5. From easy to hopeless—predicting the difficulty of phylogenetic analyses;Haag;Mol Biol Evol,2022

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Clinical and microbiological features of a cohort of patients with Acinetobacter baumannii bloodstream infections;European Journal of Clinical Microbiology & Infectious Diseases;2024-07-18

2. Much Ado About Nothing: Accelerating Maximum Likelihood Phylogenetic Inference via Early Stopping to evade (Over-)optimization;2024-07-08

3. The Influence of the Number of Tree Searches on Maximum Likelihood Inference in Phylogenomics;Systematic Biology;2024-06-28

4. Predicting Phylogenetic Bootstrap Values via Machine Learning;2024-03-06

5. Phylogenetic reconciliation: making the most of genomes to understand microbial ecology and evolution;The ISME Journal;2024-01