Abstract
AbstractConventional minimum spanning tree (MST) algorithms typically start by creating a distance matrix of the $$n(n-1)/2$$
n
(
n
-
1
)
/
2
pairs of data points, leading to a time complexity of $$O(n^2)$$
O
(
n
2
)
. This initial step poses a computational bottleneck. To overcome this limitation, we present a novel method that constructs an initial random k-neighbor graph and optimizes this graph by employing a crawling technique to efficiently approximate the k Nearest Neighbors (kNN) graph. This crawling approach allows us to approximate the closest neighbors of each node. Subsequently, the approximate kNN graph is utilized to build an initial approximate MST and iteratively refine it by the same crawling process. Using this approach, an approximate MST can be obtained for a data set of size n with empirical cost around $$O(n^{1.07})$$
O
(
n
1.07
)
and a minimal O(n) memory consumption. In contrast to spatial tree-based approaches, the presented method also scales well to high dimensional data. We have shown that the proposed approach achieves such a level of performance with only a marginal accuracy reduction between 0.5% and 6%. These qualities make it an excellent candidate for approximate MST calculation for high-dimensional, large data sets.
Funder
Budapest University of Technology and Economics
Publisher
Springer Science and Business Media LLC