Optimizing Sparse Matrix—Matrix Multiplication for the GPU-Reference-Cited by-同舟云学术

Optimizing Sparse Matrix—Matrix Multiplication for the GPU

Published:2015-10-26 Issue:4 Volume:41 Page:1-20
ISSN:0098-3500
Container-title:ACM Transactions on Mathematical Software
language:en
Short-container-title:ACM Trans. Math. Softw.

Author:

Dalton Steven¹,Olson Luke¹,Bell Nathan²

Affiliation:

1. University of Illinois at Urbana--Champaign, Urbana, IL

2. Google, Mountain View, CA

Abstract

Sparse matrix--matrix multiplication (SpGEMM) is a key operation in numerous areas from information to the physical sciences. Implementing SpGEMM efficiently on throughput-oriented processors, such as the graphics processing unit (GPU), requires the programmer to expose substantial fine-grained parallelism while conserving the limited off-chip memory bandwidth. Balancing these concerns, we decompose the SpGEMM operation into three highly parallel phases: expansion, sorting, and contraction, and introduce a set of complementary bandwidth-saving performance optimizations. Our implementation is fully general and our optimization strategy adaptively processes the SpGEMM workload row-wise to substantially improve performance by decreasing the work complexity and utilizing the memory hierarchy more effectively.

Publisher

Association for Computing Machinery (ACM)

Subject

Applied Mathematics,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2699470

Reference27 articles.

1. Sparse matrix multiplication package (SMMP)

2. Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods

3. Implementing sparse matrix-vector multiplication on throughput-oriented processors

Cited by 74 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPU;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12

2. Compilation of Modular and General Sparse Workspaces;Proceedings of the ACM on Programming Languages;2024-06-20

3. Optimizing sparse general matrix–matrix multiplication for DCUs;The Journal of Supercomputing;2024-05-30

4. On Efficient Large Sparse Matrix Chain Multiplication;Proceedings of the ACM on Management of Data;2024-05-29

5. A Survey of Accelerating Parallel Sparse Linear Algebra;ACM Computing Surveys;2023-08-28