Efficient k -Clique Count Estimation with Accuracy Guarantee

Author:

Chang Lijun1,Gamage Rashmika1,Yu Jeffrey Xu2

Affiliation:

1. The University of Sydney, Sydney, Australia

2. The Chinese University of Hong Kong, Hong Kong, China

Abstract

Counting and enumerating all occurrences of k -cliques, i.e., complete subgraphs with k vertices, in a large graph G is a fundamental problem with many applications. However, exact solutions are often infeasible due to the exponential growth in the number of k -cliques when k increases. Thus, a more practical approach is approximately counting and uniformly sampling k -cliques. Turán-Shadow and DPColorPath are two state-of-the-art algorithms for approximately counting k -cliques. The general idea is first constructing a sample space that is a superset of all k -cliques in G , and then sampling t elements uniformly-at-random (u.a.r.) from the sample space for a pre-determined t ; the k -clique count is estimated as the sample space size multiplied by the ratio of k -cliques among the t samples. Although techniques have been proposed in Turán-Shadow for setting t to ensure the estimation accuracy, the theoretically chosen t is often too large to be practical. As a result, both of the existing algorithms used a fixed t in their implementations and thus do not offer accuracy guarantee. In this paper, we propose the first randomized algorithm that achieves the theoretical estimation accuracy and the practical efficiency at the same time. Different from the existing algorithms, we pre-determine the number s of k-clique samples that are required to achieve the estimation accuracy. Consequently, we can estimate the running time of the sampling stage (i.e., time taken to sample sk -cliques), for a given sample space. Then, we propose to balance the time of constructing/refining the sample space and the time of the sampling stage, by stopping the refinement of the sample space once the elapsed time is comparable to the estimated time of the sampling stage. Extensive empirical studies on large real graphs show that our algorithm SR-kCCE provides an accurate k -clique count estimation and also runs efficiently. As a by-product, our algorithm can also be used for efficiently sampling a certain number of k -cliques u.a.r. from G.

Publisher

Association for Computing Machinery (ACM)

Reference29 articles.

1. Parallel K-clique counting on GPUs. In Proc. of ICS'22;Almasri Mohammad;ACM,2022

2. Motif Counting Beyond Five Nodes;Bressan Marco;ACM Trans. Knowl. Discovery Data,2018

3. Algorithm 457: finding all cliques of an undirected graph

4. Springer Series in the Data Sciences;Chang Lijun

5. Arboricity and Subgraph Listing Algorithms

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3