Load-balancing Sparse Matrix Vector Product Kernels on GPUs-Reference-Cited by-同舟云学术

Load-balancing Sparse Matrix Vector Product Kernels on GPUs

Published:2020-04-02 Issue:1 Volume:7 Page:1-26
ISSN:2329-4949
Container-title:ACM Transactions on Parallel Computing
language:en
Short-container-title:ACM Trans. Parallel Comput.

Author:

Anzt Hartwig¹,Cojean Terry²,Yen-Chen Chen³,Dongarra Jack⁴,Flegar Goran⁵,Nayak Pratik²,Tomov Stanimire⁶,Tsai Yuhsiang M.³,Wang Weichung³

Affiliation:

1. Karlsruhe Institute of Technology, Germany and University of Tennessee, USA

2. Karlsruhe Institute of Technology, Germany

3. National Taiwan University, Taiwan

4. University of Tennessee, Oak Ridge National Lab, and University of Manchester, UK

5. University of Jaume I, Spain

6. University of Tennessee, USA

Abstract

Efficient processing of Irregular Matrices on Single Instruction, Multiple Data (SIMD)-type architectures is a persistent challenge. Resolving it requires innovations in the development of data formats, computational techniques, and implementations that strike a balance between thread divergence, which is inherent for Irregular Matrices, and padding, which alleviates the performance-detrimental thread divergence but introduces artificial overheads. To this end, in this article, we address the challenge of designing high performance sparse matrix-vector product (S p MV) kernels designed for Nvidia Graphics Processing Units (GPUs). We present a compressed sparse row (CSR) format suitable for unbalanced matrices. We also provide a load-balancing kernel for the coordinate (COO) matrix format and extend it to a hybrid algorithm that stores part of the matrix in SIMD-friendly Ellpack format (ELL) format. The ratio between the ELL- and the COO-part is determined using a theoretical analysis of the nonzeros-per-row distribution. For the over 2,800 test matrices available in the Suite Sparse matrix collection, we compare the performance against S p MV kernels provided by NVIDIA’s cuSPARSE library and a heavily-tuned sliced ELL (SELL-P) kernel that prevents unnecessary padding by considering the irregular matrices as a combination of matrix blocks stored in ELL format.

Funder

Helmholtz Association

U.S. Department of Energy

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modelling and Simulation,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3380930

Reference30 articles.

1. Towards Continuous Benchmarking

Cited by 27 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPU;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12

2. Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems;ACM Transactions on Architecture and Code Optimization;2024-07-08

3. Revisiting thread configuration of SpMV kernels on GPU: A machine learning based approach;Journal of Parallel and Distributed Computing;2024-03

4. HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors;2023 IEEE International Conference on Cluster Computing (CLUSTER);2023-10-31

5. Connectivity-Aware Link Analysis for Skewed Graphs;Proceedings of the 52nd International Conference on Parallel Processing;2023-08-07