Affiliation:
1. University of Southern California
2. U.S Army Research Lab
Abstract
Hub labeling is widely used to improve the latency and throughput of Point-to-Point Shortest Distance (PPSD) queries in graph databases. However, constructing hub labeling, even via the state-of-the-art Pruned Landmark Labeling (PLL) algorithm is computationally intensive. PLL further has a sequential root order label dependency that makes it challenging to parallelize. Hence, the existing parallel approaches are often plagued by label size increase, poor scalability and inability to process large weighted graphs.
In this paper, we develop novel algorithms that construct the minimal (guaranteed) Canonical Hub Labeling on shared and distributed-memory parallel systems in a scalable and efficient manner. Our key contribution, the PLaNT algorithm, provides an
embarrassingly parallel
approach for label construction that scales well beyond the limits of current practice. Our approach
is the first
to employ a collaborative label partitioning scheme across multiple nodes of a cluster, for completely in-memory labeling and parallel querying on massive graphs whose labels cannot fit on a single node.
On a single node with 72-threads, our shared-memory algorithm is up to 47.4X faster than sequential PLL. While our labeling time is comparable to the state-of-the-art shared-memory paraPLL, our label size is 17% smaller on average.
PLaNT demonstrates superior parallel scalability. It can process significantly larger graphs and construct labeling orders of magnitude faster than the state-of-the-art distributed paraPLL. Compared to the best shared-memory parallel algorithm, it achieves up to 9.5X speedup on a 64 node cluster.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献