Grus-Reference-Cited by-同舟云学术

Grus

Published:2021-03 Issue:2 Volume:18 Page:1-25
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Wang Pengyu¹,Wang Jing¹,Li Chao¹,Wang Jianzong²,Zhu Haojin¹,Guo Minyi¹

Affiliation:

1. Shanghai Jiao Tong University, Shanghai, China

2. Ping An Technology, Guangdong, China

Abstract

Today’s GPU graph processing frameworks face scalability and efficiency issues as the graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe memory with Unified Memory (UM), they incur significant overhead when handling graph-structured data. In addition, many popular processing frameworks suffer sub-optimal efficiency due to heavy atomic operations when tracking the active vertices. This article presents Grus, a novel system framework that allows GPU graph processing to stay competitive with the ever-growing graph complexity. Grus improves space efficiency through a UM trimming scheme tailored to the data access behaviors of graph workloads. It also uses a lightweight frontier structure to further reduce atomic operations. With easy-to-use interface that abstracts the above details, Grus shows up to 6.4× average speedup over the state-of-the-art in-memory GPU graph processing framework. It allows one to process large graphs of 5.5 billion edges in seconds with a single GPU.

Funder

National Key Research 8 Development Program of China

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3444844

Reference70 articles.

1. Andy Adinets. 2014. CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics | NVIDIA Developer Blog. Retrieved from https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/. Andy Adinets. 2014. CUDA Pro Tip: Optimized Filtering with Warp-Aggregated Atomics | NVIDIA Developer Blog. Retrieved from https://developer.nvidia.com/blog/cuda-pro-tip-optimized-filtering-warp-aggregated-atomics/.

2. Mosaic

3. MASK

4. Groute

5. SlimSell: A Vectorizable Graph Representation for Breadth-First Search

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GRIT: Enhancing Multi-GPU Performance with Fine-Grained Dynamic Page Placement;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

2. CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processor;Proceedings of the VLDB Endowment;2024-02

3. Efficient Gpu Implementation of Static and Incrementally Expanding Df-P Pagerank for Dynamic Graphs;2024

4. Harnessing Data Movement Strategies to Optimize Performance-Energy Efficiency of Oil & Gas Simulations in HPC;Lecture Notes in Computer Science;2024

5. Optimizing GPU-Based Graph Sampling and Random Walk for Efficiency and Scalability;IEEE Transactions on Computers;2023-09-01