Affiliation:
1. Department of Computer Science University of Sherbrooke Sherbrooke Quebec Canada
Abstract
AbstractPhylogenetic trees represent the evolutionary relationships and ancestry of various species or groups of organisms. Comparing these trees by measuring the distance between them is essential for applications such as tree clustering and the Tree of Life project. Many distance metrics for phylogenetic trees focus on trees defined on the same set of taxa. However, some problems require calculating distances between trees with different but overlapping sets of taxa. This study reviews state‐of‐the‐art distance measures for such trees, covering six major approaches, including the constraint‐based Robinson–Foulds (RF) distance RF(−), the completion‐based RF(+), the generalized RF (GRF), the dissimilarity measure, the vectorial tree distance, and the geodesic distance in the extended Billera‐Holmes‐Vogtmann tree space. Among these, three RF‐based methods, RF(−), RF(+), and GRF, were examined in detail on generated clusters of phylogenetic trees defined on different but mutually overlapping sets of taxa. Additionally, we reviewed nine related techniques, including leaf imputation methods, the tree edit distance, and visual comparison. A comparison of the related distance measures, highlighting their principal advantages and shortcomings, is provided. This review offers valuable insights into their applicability and performance, guiding the appropriate use of these metrics based on tree type (rooted or unrooted) and information type (topological or branch lengths).
Funder
Natural Sciences and Engineering Research Council of Canada