ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy Guarantee-Reference-Cited by-同舟云学术

ClipSim: A GPU-friendly Parallel Framework for Single-Source SimRank with Accuracy Guarantee

Published:2023-05-26 Issue:1 Volume:1 Page:1-26
ISSN:2836-6573
Container-title:Proceedings of the ACM on Management of Data
language:en
Short-container-title:Proc. ACM Manag. Data

Author:

Wu Tianhao¹^ORCID,Cheng Ji²^ORCID,Zhang Chaorui³^ORCID,Hou Jianfeng³^ORCID,Chen Gengjian⁴^ORCID,Huang Zhongyi¹^ORCID,Zhang Weixi³^ORCID,Han Wei³^ORCID,Bai Bo³^ORCID

Affiliation:

1. Tsinghua University, Beijing, China

2. Hong Kong University of Science and Technology, Hong Kong, China

3. Theory Lab, 2012 Labs & Huawei Technologies, Co. Ltd, Hong Kong, China

4. Wuhan University, Wuhan, China

Abstract

SimRank is an important metric to measure the topological similarity between two nodes in a graph. In particular, single-source and top-k SimRank has numerous applications in recommendation systems, network analysis, and web mining, etc. Mathematically, given a vertex, the computation of single-machine and single-source SimRank mainly lies in matrix-matrix operations. However, it is almost impossible to directly compute on large graphs. Thus, existing works yield to two main operations: a series of random walks, and sparse matrix and dense vector multiplication operations. This brings about high computation cost for SimRank on large graphs. In real-world applications, there is always the query time and accuracy trade-off, which hinders the computation of high-precision SimRank on large-scale graphs. To handle this problem, this paper proposesClipSim, the first GPU-friendly parallel framework that accelerates the single-source SimRank on GPU with accuracy guarantee. We design a novel data structure and GPU-friendly parallel algorithms for efficient computation of all the operations of SimRank on GPU. Moreover, our theoretical derivation enables ClipSim to largely reduce the number of random walks required for each node, while maintaining the same theoretical accuracy as the state-of-the-art algorithm, ExactSim. We conduct extensive experiments on real-world and synthetic datasets to demonstrate the accuracy and efficiency of ClipSim. The results show that compared with ExactSim, ClipSim obtains single-source SimRank vectors with the same accuracy and up to 160× faster computation time.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3588707

Reference39 articles.

1. Simrank++

2. Piotr Bialas and Adam Strzelecki . 2015 . Benchmarking the cost of thread divergence in CUDA . In International Conference on Parallel Processing and Applied Mathematics. Springer , Krakow, Poland, 570--579. Piotr Bialas and Adam Strzelecki. 2015. Benchmarking the cost of thread divergence in CUDA. In International Conference on Parallel Processing and Applied Mathematics. Springer, Krakow, Poland, 570--579.

3. On the representation and multiplication of hypersparse matrices

4. Rohit Chandra Leo Dagum David Kohr Ramesh Menon Dror Maydan and Jeff McDonald. 2001. Parallel programming in OpenMP. Morgan kaufmann. Rohit Chandra Leo Dagum David Kohr Ramesh Menon Dror Maydan and Jeff McDonald. 2001. Parallel programming in OpenMP. Morgan kaufmann.

5. An elementary proof of the strong law of large numbers