Massively parallel algorithms for personalized pagerank-Reference-Cited by-同舟云学术

Massively parallel algorithms for personalized pagerank

Published:2021-05 Issue:9 Volume:14 Page:1668-1680
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Hou Guanhao¹,Chen Xingguang¹,Wang Sibo¹,Wei Zhewei²

Affiliation:

1. The Chinese University of Hong Kong

2. Renmin University of Chia

Abstract

Personalized PageRank (PPR) has wide applications in search engines, social recommendations, community detection, and so on. Nowadays, graphs are becoming massive and many IT companies need to deal with large graphs that cannot be fitted into the memory of most commodity servers. However, most existing state-of-the-art solutions for PPR computation only work for single-machines and are inefficient for the distributed framework since such solutions either (i) result in an excessively large number of communication rounds, or (ii) incur high communication costs in each round. Motivated by this, we present Delta-Push , an efficient framework for single-source and top- k PPR queries in distributed settings. Our goal is to reduce the number of rounds while guaranteeing that the load, i.e., the maximum number of messages an executor sends or receives in a round, can be bounded by the capacity of each executor. We first present a non-trivial combination of a redesigned parallel push algorithm and the Monte-Carlo method to answer single-source PPR queries. The solution uses pre-sampled random walks to reduce the number of rounds for the push al6gorithm. Theoretical analysis under the Massively Parallel Computing (MPC) model shows that our proposed solution bounds the communication rounds to [EQUATION] under a load of O ( m/p ), where m is the number of edges of the input graph, p is the number of executors, and ϵ is a user-defined error parameter. In the meantime, as the number of executors increases to p' = γ · p , the load constraint can be relaxed since each executor can hold O (γ · m/p' ) messages with invariant local memory. In such scenarios, multiple queries can be processed in batches simultaneously. We show that with a load of O (γ · m/p' ), our Delta-Push can process γ queries in a batch with [EQUATION] rounds, while other baseline solutions still keep the same round cost for each batch. We further present a new top- k algorithm that is friendly to the distributed framework and reduces the number of rounds required in practice. Extensive experiments show that our proposed solution is more efficient than alternatives.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/3461535.3461554

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient Algorithms for Personalized PageRank Computation: A Survey;IEEE Transactions on Knowledge and Data Engineering;2024-09

2. Fast Computation of Kemeny's Constant for Directed Graphs;Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining;2024-08-24

3. FICOM: an effective and scalable active learning framework for GNNs on semi-supervised node classification;The VLDB Journal;2024-07-22

4. Efficient Approximation of Kemeny's Constant for Large Graphs;Proceedings of the ACM on Management of Data;2024-05-29

5. Personalized PageRanks over Dynamic Graphs - The Case for Optimizing Quality of Service;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13