Author:
Bansal Mukul S,Burleigh J Gordon,Eulenstein Oliver
Abstract
Abstract
Background
Genomic data provide a wealth of new information for phylogenetic analysis. Yet making use of this data requires phylogenetic methods that can efficiently analyze extremely large data sets and account for processes of gene evolution, such as gene duplication and loss, incomplete lineage sorting (deep coalescence), or horizontal gene transfer, that cause incongruence among gene trees. One such approach is gene tree parsimony, which, given a set of gene trees, seeks a species tree that requires the smallest number of evolutionary events to explain the incongruence of the gene trees. However, the only existing algorithms for gene tree parsimony under the duplication-loss or deep coalescence reconciliation cost are prohibitively slow for large datasets.
Results
We describe novel algorithms for SPR and TBR based local search heuristics under the duplication-loss cost, and we show how they can be adapted for the deep coalescence cost. These algorithms improve upon the best existing algorithms for these problems by a factor of n, where n is the number of species in the collection of gene trees. We implemented our new SPR based local search algorithm for the duplication-loss cost and demonstrate the tremendous improvement in runtime and scalability it provides compared to existing implementations. We also evaluate the performance of our algorithm on three large-scale genomic data sets.
Conclusion
Our new algorithms enable, for the first time, gene tree parsimony analyses of thousands of genes from hundreds of taxa using the duplication-loss and deep coalescence reconciliation costs. Thus, this work expands both the size of data sets and the range of evolutionary models that can be incorporated into genome-scale phylogenetic analyses.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference36 articles.
1. Maddison WP: Gene Trees in Species Trees. Systematic Biology 1997, 46: 523-536.
2. Degnan JH, Rosenberg NA: Discordance of Species Trees with Their Most Likely Gene Trees. PLoS Genetics 2006, 2(5):e68. 10.1371/journal.pgen.0020068
3. Goodman M, Czelusniak J, Moore GW, Romero-Herrera AE, Matsuda G: Fitting the gene lineage into its species lineage. A parsimony strategy illustrated by cladograms constructed from globin sequences. Systematic Zoology 1979, 28: 132-163. 10.2307/2412519
4. Page RDM: Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Systematic Biology 1994, 43: 58-77.
5. Guigó R, Muchnik I, Smith TF: Reconstruction of Ancient Molecular Phylogeny. Molecular Phylogenetics and Evolution 1996, 6(2):189-213. 10.1006/mpev.1996.0071
Cited by
43 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献