Affiliation:
1. University of Texas, Austin, TX, USA
2. Texas State University, San Marcos, TX, USA
Abstract
Graphics Processing Units (GPUs) have emerged as powerful accelerators for many
regular
algorithms that operate on dense arrays and matrices. In contrast, we know relatively little about using GPUs to accelerate highly
irregular
algorithms that operate on pointer-based data structures such as graphs. For the most part, research has focused on GPU implementations of graph analysis algorithms that do not modify the structure of the graph, such as algorithms for breadth-first search and strongly-connected components.
In this paper, we describe a high-performance GPU implementation of an important graph algorithm used in compilers such as gcc and LLVM: Andersen-style inclusion-based points-to analysis. This algorithm is challenging to parallelize effectively on GPUs because it makes extensive modifications to the structure of the underlying graph and performs relatively little computation. In spite of this, our program, when executed on a 14 Streaming Multiprocessor GPU, achieves an average speedup of 7x compared to a sequential CPU implementation and outperforms a parallel implementation of the same algorithm running on 16 CPU cores.
Our implementation provides general insights into how to produce high-performance GPU implementations of graph algorithms, and it highlights key differences between optimizing parallel programs for multicore CPUs and for GPUs.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference34 articles.
1. NVIDIA's Next Generation CUDA Compute Architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf 2010. NVIDIA's Next Generation CUDA Compute Architecture: Fermi. http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf 2010.
2. CUDA C Programming Guide 4.0. NVIDIA 2011. CUDA C Programming Guide 4.0. NVIDIA 2011.
3. Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2
4. Computing Strongly Connected Components in Parallel on CUDA
5. Points-to analysis using BDDs
Cited by
53 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献