Affiliation:
1. Georgia Tech
2. Oak Ridge National Lab
3. New Jersey Institute of Tech
Abstract
Due to the limited capacity of GPU memory, the majority of prior work on graph applications on GPUs has been restricted to graphs of modest sizes that fit in memory. Recent hardware and software advances make it possible to address much larger host memory transparently as a part of a feature known as unified virtual memory. While accessing host memory over an interconnect is understandably slower, the problem space has not been sufficiently explored in the context of a challenging workload with low computational intensity and an irregular data access pattern such as graph traversal. We analyse the performance of breadth first search (BFS) for several large graphs in the context of unified memory and identify the key factors that contribute to slowdowns. Next, we propose a lightweight offline graph reordering algorithm, HALO (Harmonic Locality Ordering), that can be used as a pre-processing step for static graphs. HALO yields speedups of 1.5x-1.9x over baseline in subsequent traversals. Our method specifically aims to cover large directed real world graphs in addition to undirected graphs whereas prior methods only account for the latter. Additionally, we demonstrate ties between the locality ordering problem and graph compression and show that prior methods from graph compression such as recursive graph bisection can be suitably adapted to this problem.
Subject
General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development
Cited by
36 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Bandwidth-Effective DRAM Cache for GPU s with Storage-Class Memory;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02
2. CGgraph: An Ultra-Fast Graph Processing System on Modern Commodity CPU-GPU Co-processor;Proceedings of the VLDB Endowment;2024-02
3. Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory;ACM Transactions on Architecture and Code Optimization;2024-01-18
4. GPU Graph Processing on CXL-Based Microsecond-Latency External Memory;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12
5. Deployment of Real-Time Network Traffic Analysis Using GraphBLAS Hypersparse Matrices and D4M Associative Arrays;2023 IEEE High Performance Extreme Computing Conference (HPEC);2023-09-25