Fast and Memory-Efficient Approximate Minimum Spanning Tree Generation for Large Datasets-Reference-Cited by-同舟云学术

Fast and Memory-Efficient Approximate Minimum Spanning Tree Generation for Large Datasets

Published:2024-06-21 Issue: Volume: Page:
ISSN:2193-567X
Container-title:Arabian Journal for Science and Engineering
language:en
Short-container-title:Arab J Sci Eng

Author:

Almansoori Mahmood K. M.^ORCID,Meszaros Andras,Telek Miklos

Abstract

AbstractConventional minimum spanning tree (MST) algorithms typically start by creating a distance matrix of the

$$n(n-1)/2$$

n ( n - 1 ) / 2 pairs of data points, leading to a time complexity of

$$O(n^2)$$

O ( n 2 ) . This initial step poses a computational bottleneck. To overcome this limitation, we present a novel method that constructs an initial random k-neighbor graph and optimizes this graph by employing a crawling technique to efficiently approximate the k Nearest Neighbors (kNN) graph. This crawling approach allows us to approximate the closest neighbors of each node. Subsequently, the approximate kNN graph is utilized to build an initial approximate MST and iteratively refine it by the same crawling process. Using this approach, an approximate MST can be obtained for a data set of size n with empirical cost around

$$O(n^{1.07})$$

O ( n 1.07 ) and a minimal O(n) memory consumption. In contrast to spatial tree-based approaches, the presented method also scales well to high dimensional data. We have shown that the proposed approach achieves such a level of performance with only a marginal accuracy reduction between 0.5% and 6%. These qualities make it an excellent candidate for approximate MST calculation for high-dimensional, large data sets.

Funder

Budapest University of Technology and Economics

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s13369-024-08974-y.pdf

Reference45 articles.

1. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 100(1), 68–86 (1971)

2. Xu, D.; Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data Sci. 2, 165–193 (2015)

3. Jothi, R.; Mohanty, S.K.; Ojha, A.: Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Comput. Biol. Med. 71, 135–148 (2016)

4. Mohapatra, C.; Ray, B.B.: A survey on large datasets minimum spanning trees. In: International Symposium on Artificial Intelligence, pp. 26–35. Springer, Berlin (2022)

5. Juszczak, P.; Tax, D.M.; Pe, E.; et al.: Minimum spanning tree based one-class classifier. Neurocomputing 72(7–9), 1859–1869 (2009)