A near-optimal approach to edge connectivity-based hierarchical graph decomposition-Reference-Cited by-同舟云学术

A near-optimal approach to edge connectivity-based hierarchical graph decomposition

Published:2023-05-06 Issue:1 Volume:33 Page:49-71
ISSN:1066-8888
Container-title:The VLDB Journal
language:en
Short-container-title:The VLDB Journal

Author:

Chang Lijun,Wang Zhiyi

Abstract

AbstractThe problem of efficiently computing all

$$k$$

k -edge-connected components (

$$k$$

k -ECCs) of a graph G for a user-givenk has been extensively studied recently in view of its importance in many applications. The

$$k$$

k -ECCs of G for all possible values ofk form a hierarchical structure; that is, any two different

$$k$$

k -ECCs for the same k value are disjoint and any

$$k$$

k -ECC is contained in a unique

$$(k\text {-}1)$$

( k - 1 ) -ECC. In this paper, we study the problem of efficiently constructing the hierarchy tree of the

$$k$$

k -ECCs for all possible k values, for a graph G. The existing approaches

$$\textsf{TD}$$

TD and

$$\textsf{BU}$$

BU construct the hierarchy tree in either a top-down manner or a bottom-up manner, with both having the time complexity of

$${{\mathcal {O}}}\big (\delta (G)\times {\mathsf {T_{KECC}}} (G)\big )$$

O ( δ ( G ) × T KECC ( G ) ) , where

$$\delta (G)$$

δ ( G ) is the degeneracy of G and

$${\mathsf {T_{KECC}}} (G)$$

T KECC ( G ) is the time complexity of computing all

$$k$$

k -ECCs of G for a specific k value. Here, the degeneracy of G is defined as the maximum value among the minimum vertex degrees of all subgraphs of G and is at most

$$\sqrt{m}$$

m where m is the number of edges in G. To improve the time complexity, we propose a divide-and-conquer approach

$$\textsf{DC}$$

DC running in

$${{\mathcal {O}}}\big ( (\log \delta (G))\times {\mathsf {T_{KECC}}} (G)\big )$$

O ( ( log δ ( G ) ) × T KECC ( G ) ) time; this time complexity is optimal up to a logarithmic factor. However, a straightforward implementation of

$$\textsf{DC}$$

DC would take

$${{\mathcal {O}}}( (m + n) \log \delta (G))$$

O ( ( m + n ) log δ ( G ) ) main-memory space, which could easily run out-of-memory when processing large graphs; here, n is the number of vertices in G. To reduce the main-memory footprint of our algorithm, we propose adjacency array-based techniques to optimize the space complexity to

$$2m+{{\mathcal {O}}}(n\log \delta (G))$$

2 m + O ( n log δ ( G ) ) and denote our resulting algorithm by

$$\mathsf {DC\text {-}AA}$$

DC - AA . As a by-product of

$$\mathsf {DC\text {-}AA}$$

DC - AA , we also improve the space complexity of the state-of-the-art algorithm for computing all

$$k$$

k -ECCs for a specific k to

$$2m + {{\mathcal {O}}}(n)$$

2 m + O ( n ) , by using the same technique as used in

$$\mathsf {DC\text {-}AA}$$

DC - AA . Finally, we propose optimization techniques to improve the practical efficiency of the existing approach

$$\textsf{BU}$$

BU and denote the space-optimized version of it as

$$\mathsf {BU^*\text {-}AA}$$

BU ∗ - AA which runs in

$${{\mathcal {O}}}\big (\delta (G)\times {\mathsf {T_{KECC}}} (G)\big )$$

O ( δ ( G ) × T KECC ( G ) ) time and

$$2m+{{\mathcal {O}}}(n)$$

2 m + O ( n ) space. Extensive experiments on large real graphs and synthetic graphs demonstrate that our algorithms

$$\mathsf {DC\text {-}AA}$$

DC - AA and

$$\mathsf {BU^*\text {-}AA}$$

BU ∗ - AA outperform the state-of-the-art approaches by up to 28 times in terms of running time and by up to 8 times in terms of main memory usage. In particular, our approach

$$\mathsf {BU^*\text {-}AA}$$

BU ∗ - AA processes the Twitter graph, which has more than 1 billion undirected edges, in 29 min with 13.5 GB memory, while the state-of-the-art approaches take more than 13 h after our space optimization; note that the state-of-the-art approaches run out-of-memory if without our space optimization. Our empirical study also shows that

$$\mathsf {BU^*\text {-}AA}$$

BU ∗ - AA , despite having a higher time complexity, performs better than

$$\mathsf {DC\text {-}AA}$$

DC - AA in practice. We also remark that

$$\mathsf {BU^*\text {-}AA}$$

BU ∗ - AA is much simpler and easier to implement than

$$\mathsf {DC\text {-}AA}$$

DC - AA .

Funder

Australian Research Council

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems

Link

https://link.springer.com/content/pdf/10.1007/s00778-023-00797-x.pdf

Reference43 articles.

1. Aggarwal, C.C., Xie, Y., Philip, S.Y.: Gconnect: a connectivity index for massive disk-resident graphs. PVLDB 2(1), 862–873 (2009)

2. Agrawal, R., Rajagopalan, S., Srikant, R., Xu, Y.: Mining newsgroups using networks arising from social behavior. In: Proceedings of WWW’03, pp. 529–535 (2003)

3. Akiba, T., Iwata, Y., Yoshida, Y.: Linear-time enumeration of maximal k-edge-connected subgraphs in large networks by random contraction. In: Proceedings of CIKM’13, pp. 909–918 (2013)

4. Batagelj, V., Zaversnik, M.: An o(m) algorithm for cores decomposition of networks. CoRR arXiv:cs.DS/0310049 (2003)

5. Benczúr, A.A., Karger, D.R.: Randomized approximation schemes for cuts and flows in capacitated graphs. CoRR arXiv:cs.DS/0207078 (2002)