Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge-Reference-Cited by-同舟云学术

Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge

Published:2018-11-13 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Molloy Erin K.^ORCID,Warnow Tandy^ORCID

Abstract

AbstractBackgroundDivide-and-conquer methods, which divide the species set into overlapping subsets, construct a tree on each subset, and then combine the subset trees using a supertree method, provide a key algorithmic framework for boosting the scalability of phylogeny estimation methods to large datasets. Yet the use of supertree methods, which typically attempt to solve NP-hard optimization problems, limits the scalability of such approaches.ResultsIn this paper, we introduce a divide-and-conquer approach that does not require supertree estimation: we divide the species set into pairwise disjoint subsets, construct a tree on each subset using a base method, and then combine the subset trees using a distance matrix. For this merger step, we present a new method, called NJMerge, which is a polynomial-time extension of Neighbor Joining (NJ); thus, NJMerge can be viewed either as a method for improving traditional NJ or as a method for scaling the base method to larger datasets. We prove that NJMerge can be used to create divide-and-conquer pipelines that are statistically consistent under some models of evolution. We also report the results of an extensive simulation study evaluating NJMerge on multi-locus datasets with up to 1000 species. We found that NJMerge sometimes improved the accuracy of traditional NJ and substantially reduced the running time of three popular species tree methods (ASTRAL-III, SVDquartets, and “concatenation” using RAxML) without sacrificing accuracy. Finally, although NJMerge can fail to return a tree, in our experiments, NJMerge failed on only 11 out of 2560 test cases.ConclusionsTheoretical and empirical results suggest that NJMerge is a valuable technique for large-scale phylogeny estimation, especially when computational resources are limited. NJMerge is freely available on Github (http://github.com/ekmolloy/njmerge).

Publisher

Cold Spring Harbor Laboratory

Reference58 articles.

1. Inferring a Tree from Lowest Common Ancestors with an Application to the Optimization of Relational Expressions;SIAM Journal on Computing,1981

2. Species Tree Inference from Gene Splits by Unrooted STAR Methods;IEEE/ACM Transactions on Computational Biology and Bioinformatics,2018

3. E. S. Allman , C. Long , and J. A. Rhodes . Species tree inference from genomic sequences using the log-det distance, 2018.

4. The Performance of Neighbor-Joining Methods of Phylogenetic Reconstruction;Algorithmica,1999

5. Robinson-foulds supertrees;Algorithms for Molecular Biology,2010

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An empirical assessment of a single family‐wide hybrid capture locus set at multiple evolutionary timescales in Asteraceae;Applications in Plant Sciences;2019-10

2. TreeCluster: clustering biological sequences using phylogenetic trees;2019-03-28

3. Using INC Within Divide-and-Conquer Phylogeny Estimation;Algorithms for Computational Biology;2019

4. New Divide-and-Conquer Techniques for Large-Scale Phylogenetic Estimation;Algorithms for Computational Biology;2019