Author:
Wawerka Marcin,Dąbkowski Dawid,Rutecka Natalia,Mykowiecka Agnieszka,Górecki Paweł
Abstract
Abstract
Background
Phylogenetic networks are mathematical models of evolutionary processes involving reticulate events such as hybridization, recombination, or horizontal gene transfer. One of the crucial notions in phylogenetic network modelling is displayed tree, which is obtained from a network by removing a set of reticulation edges. Displayed trees may represent an evolutionary history of a gene family if the evolution is shaped by reticulation events.
Results
We address the problem of inferring an optimal tree displayed by a network, given a gene tree G and a tree-child network N, under the deep coalescence and duplication costs. We propose an O(mn)-time dynamic programming algorithm (DP) to compute a lower bound of the optimal displayed tree cost, where m and n are the sizes of G and N, respectively. In addition, our algorithm can verify whether the solution is exact. Moreover, it provides a set of reticulation edges corresponding to the obtained cost. If the cost is exact, the set induces an optimal displayed tree. Otherwise, the set contains pairs of conflicting edges, i.e., edges sharing a reticulation node. Next, we show a conflict resolution algorithm that requires $$2^{r+1}-1$$
2
r
+
1
-
1
invocations of DP in the worst case, where r is the number of reticulations. We propose a similar $$O(2^kmn)$$
O
(
2
k
m
n
)
-time algorithm for level-k tree-child networks and a branch and bound solution to compute lower and upper bounds of optimal costs. We also extend the algorithms to a broader class of phylogenetic networks. Based on simulated data, the average runtime is $$\Theta (2^{{0.543}k}mn)$$
Θ
(
2
0.543
k
m
n
)
under the deep-coalescence cost and $$\Theta (2^{{0.355}k}mn)$$
Θ
(
2
0.355
k
m
n
)
under the duplication cost.
Conclusions
Despite exponential complexity in the worst case, our algorithms perform significantly well on empirical and simulated datasets, due to the strategy of resolving internal dissimilarities between gene trees and networks. Therefore, the algorithms are efficient alternatives to enumeration strategies commonly proposed in the literature and enable analyses of complex networks with dozens of reticulations.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computational Theory and Mathematics,Molecular Biology,Structural Biology
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献