Affiliation:
1. Nanjing University, Nanjing, China
2. Alibaba Group, Hangzhou, China
Abstract
Butterfly (a cyclic graph motif) counting is a fundamental task with many applications in graph analysis, which aims at computing the number of butterflies in a large graph. With the rapid growth of graph data, it is more and more challenging to do butterfly counting due to the super-linear time complexity and large memory consumption. In this paper, we study I/O-efficient algorithms for doing butterfly counting on hierarchical memory. Existing algorithms of the kind cannot guarantee I/O optimality. Observing that in order to count butterflies, it suffices to "witness" a subgraph instead of the whole structure, a new class of algorithms called semi-witnessing algorithm is proposed. We prove that a semi-witnessing algorithm is not restricted by the lower bound Ømega(|E|2/MB) of a witnessing algorithm, and give a new bound of Ømega(min(|E|2/MB, |E|/|V| √M B)). We further develop the IOBufs algorithm that manages to approach the I/O lower bound, and thus claim its optimality. Finally, we make efforts to parallelize IOBufs to further improve the performance and scalability. We show in the experiment that IOBufs significantly outperforms the state-of-the-art algorithms EMRC and BFC-EM. In addition, IOBufs can scale to conducting butterfly counting on the Clueweb graph with 37 billion edges and quintillions (10^18 ) of butterflies.
Funder
Leading-edge Technology Program of Jiangsu NSF
NSFC
National Key R&D Program of China
Publisher
Association for Computing Machinery (ACM)
Reference66 articles.
1. Measuring and modeling bipartite graphs with community structure
2. Khaled Ammar , Frank McSherry , Semih Salihoglu , and Manas Joglekar . 2018. Distributed evaluation of subgraph queries using worstcase optimal lowmemory dataflows. arXiv preprint arXiv:1802.03760 ( 2018 ). Khaled Ammar, Frank McSherry, Semih Salihoglu, and Manas Joglekar. 2018. Distributed evaluation of subgraph queries using worstcase optimal lowmemory dataflows. arXiv preprint arXiv:1802.03760 (2018).
3. CECI
4. Efficient Subgraph Matching by Postponing Cartesian Products
5. Paolo Boldi Andrea Marino Massimo Santini and Sebastiano Vigna. 2014. BUbiNG: Massive Crawling for the Masses. In WWW. 227--228. Paolo Boldi Andrea Marino Massimo Santini and Sebastiano Vigna. 2014. BUbiNG: Massive Crawling for the Masses. In WWW. 227--228.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Butterfly Counting over Bipartite Graphs with Local Differential Privacy;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13