Abstract
AbstractInferring the most probable evolutionary tree given leaf nodes is an important problem in computational biology that reveals the evolutionary relationships between species. Due to the exponential growth of possible tree topologies, finding the best tree in polynomial time becomes computationally infeasible. In this work, we propose a novel differentiable approach as an alternative to traditional heuristic-based combinatorial tree search methods in phylogeny. The optimization objective of interest in this work is to find the most parsimonious tree (i.e., to minimize the total number of evolutionary changes in the tree). We empirically evaluate our method using randomly generated trees of up to 128 leaves, with each node represented by a 256-length protein sequence. Our method exhibits promising convergence (< 1% error for trees up to 32 leaves, < 8% error up to 128 leaves, given only leaf node information), illustrating its potential in much broader phylogenetic inference problems and possible integration with end-to-end differentiable models. The code to reproduce the experiments in this paper can be found athttps://github.ramith.io/diff-evol-tree-search.
Publisher
Cold Spring Harbor Laboratory
Reference35 articles.
1. Attwood, S. W. , Hill, S. C. , Aanensen, D. M. , Connor, T. R. , and Pybus, O. G. Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic, 2022. ISSN 14710064.
2. The tree reconstruction game: phylogenetic reconstruction using reinforcement learning;arXiv,2023
3. Efficient and modular implicit differentiation;arXiv,2021
4. The Worst Case Complexity of Maximum Parsimony
5. Phylogenetic Analysis: Models and Estimation Procedures