Affiliation:
1. NVIDIA Corporation
2. University of Virginia
Abstract
Breadth-First Search (BFS) is a core primitive for graph traversal and a basis for many higher-level graph analysis algorithms. It is also representative of a class of parallel computations whose memory accesses and work distribution are both irregular and data dependent. Recent work has demonstrated the plausibility of GPU sparse graph traversal, but has tended to focus on asymptotically inefficient algorithms that perform poorly on graphs with nontrivial diameter.
We present a BFS parallelization focused on fine-grained task management constructed from efficient prefix sum computations that achieves an asymptotically optimal O(|V| + |E|) gd work complexity. Our implementation delivers excellent performance on diverse graphs, achieving traversal rates in excess of 3.3 billion and 8.3 billion traversed edges per second using single- and quad-GPU configurations, respectively. This level of performance is several times faster than state-of-the-art implementations on both CPU and GPU platforms.
Funder
NVIDIA Graduate Fellowship
Publisher
Association for Computing Machinery (ACM)
Subject
Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software
Cited by
27 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Parallel Implementation of SPHINCS+ With GPUs;IEEE Transactions on Circuits and Systems I: Regular Papers;2024-06
2. GPU-Accelerated BFS for Dynamic Networks;Lecture Notes in Computer Science;2024
3. A Heterogeneous Parallel Computing Approach Optimizing SpTTM on CPU-GPU via GCN;ACM Transactions on Parallel Computing;2023-06-20
4. Traversing Large Compressed Graphs on GPUs;2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2023-05
5. HyTGraph: GPU-Accelerated Graph Processing with Hybrid Transfer Management;2023 IEEE 39th International Conference on Data Engineering (ICDE);2023-04