PhyloBench: A Benchmark for Evaluating Phylogenetic Programs-Reference-Cited by-同舟云学术

PhyloBench: A Benchmark for Evaluating Phylogenetic Programs

Published:2024-05-11 Issue:6 Volume:41 Page:
ISSN:0737-4038
Container-title:Molecular Biology and Evolution
language:en
Short-container-title:

Author:

Spirin Sergey¹²^ORCID,Sigorskikh Andrey³,Efremov Aleksei³,Penzar Dmitry³⁴,Karyagina Anna¹⁵⁶

Affiliation:

1. Belozersky Institute, Lomonosov Moscow State University , Moscow , Russia

2. Higher School of Economics , Moscow , Russia

3. Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University , Moscow , Russia

4. Artificial Intelligence Research Institute , Moscow , Russia

5. Gamaleya Center of Epidemiology and Microbiology , Moscow , Russia

6. Institute of Agricultural Biotechnology , Moscow , Russia

Abstract

Abstract Phylogenetic inference based on protein sequence alignment is a widely used procedure. Numerous phylogenetic algorithms have been developed, most of which have many parameters and options. Choosing a program, options, and parameters can be a nontrivial task. No benchmark for comparison of phylogenetic programs on real protein sequences was publicly available. We have developed PhyloBench, a benchmark for evaluating the quality of phylogenetic inference, and used it to test a number of popular phylogenetic programs. PhyloBench is based on natural, not simulated, protein sequences of orthologous evolutionary domains. The measure of accuracy of an inferred tree is its distance to the corresponding species tree. A number of tree-to-tree distance measures were tested. The most reliable results were obtained using the Robinson–Foulds distance. Our results confirmed recent findings that distance methods are more accurate than maximum likelihood (ML) and maximum parsimony. We tested the bayesian program MrBayes on natural protein sequences and found that, on our datasets, it performs better than ML, but worse than distance methods. Of the methods we tested, the Balanced Minimum Evolution method implemented in FastME yielded the best results on our material. Alignments and reference species trees are available at https://mouse.belozersky.msu.ru/tools/phylobench/ together with a web-interface that allows for a semi-automatic comparison of a user’s method with a number of popular programs.

Funder

Russian Science Foundation

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/mbe/advance-article-pdf/doi/10.1093/molbev/msae084/58195425/msae084.pdf

Reference32 articles.

1. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs;Altschul;Nucleic Acids Res,1997

2. MUSCLE: multiple sequence alignment with high accuracy and high throughput;Edgar;Nucleic Acids Res,2004

3. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units;Estabrook;Syst Biol,1985

4. The NCBI Taxonomy database;Federhen;Nucleic Acids Res,2012

5. Cases in which parsimony or compatibility methods will be positively misleading;Felsenstein;Syst Zool,1978