Abstract
AbstractThe rooted triplet distance measures the structural dissimilarity of two phylogenetic trees or phylogenetic networks by counting the number of rooted phylogenetic trees with exactly three leaf labels (called rooted triplets, or triplets for short) that occur as embedded subtrees in one, but not both, of them. Suppose that $$N_1 = (V_1, E_1)$$
N
1
=
(
V
1
,
E
1
)
and $$N_2 = (V_2, E_2)$$
N
2
=
(
V
2
,
E
2
)
are phylogenetic networks over a common leaf label set of size n, that $$N_i$$
N
i
has level $$k_i$$
k
i
and maximum in-degree $$d_i$$
d
i
for $$i \in \{1,2\}$$
i
∈
{
1
,
2
}
, and that the networks’ out-degrees are unbounded. Write $$N = \max (|V_1|, |V_2|)$$
N
=
max
(
|
V
1
|
,
|
V
2
|
)
, $$M = \max (|E_1|, |E_2|)$$
M
=
max
(
|
E
1
|
,
|
E
2
|
)
, $$k = \max (k_1, k_2)$$
k
=
max
(
k
1
,
k
2
)
, and $$d = \max (d_1, d_2)$$
d
=
max
(
d
1
,
d
2
)
. Previous work has shown how to compute the rooted triplet distance between $$N_1$$
N
1
and $$N_2$$
N
2
in $$\mathrm {O}(n \log n)$$
O
(
n
log
n
)
time in the special case $$k \le 1$$
k
≤
1
. For $$k > 1$$
k
>
1
, no efficient algorithms are known; applying a classic method from 1980 by Fortune et al. in a direct way leads to a running time of $${\Omega }(N^{6} n^{3})$$
Ω
(
N
6
n
3
)
and the only existing non-trivial algorithm imposes restrictions on the networks’ in- and out-degrees (in particular, it does not work when non-binary vertices are allowed). In this article, we develop two new algorithms with no such restrictions. Their running times are $$\mathrm {O}(N^{2} M + n^{3})$$
O
(
N
2
M
+
n
3
)
and $$\mathrm {O}(M + N k^{2} d^{2} + n^{3})$$
O
(
M
+
N
k
2
d
2
+
n
3
)
, respectively. We also provide implementations of our algorithms, evaluate their performance on simulated and real datasets, and make some observations on the limitations of the current definition of the rooted triplet distance in practice. Our prototype implementations have been packaged into the first publicly available software for computing the rooted triplet distance between unrestricted networks of arbitrary levels.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,General Computer Science