SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs-Reference-Cited by-同舟云学术

SURAA: A Novel Method and Tool for Loadbalanced and Coalesced SpMV Computations on GPUs

Published:2019-03-06 Issue:5 Volume:9 Page:947
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Muhammed Thaha,Mehmood Rashid^ORCID,Albeshri Aiiad,Katib Iyad

Abstract

Sparse matrix-vector (SpMV) multiplication is a vital building block for numerous scientific and engineering applications. This paper proposes SURAA (translates to speed in arabic), a novel method for SpMV computations on graphics processing units (GPUs). The novelty lies in the way we group matrix rows into different segments, and adaptively schedule various segments to different types of kernels. The sparse matrix data structure is created by sorting the rows of the matrix on the basis of the nonzero elements per row ( n p r) and forming segments of equal size (containing approximately an equal number of nonzero elements per row) using the Freedman–Diaconis rule. The segments are assembled into three groups based on the mean n p r of the segments. For each group, we use multiple kernels to execute the group segments on different streams. Hence, the number of threads to execute each segment is adaptively chosen. Dynamic Parallelism available in Nvidia GPUs is utilized to execute the group containing segments with the largest mean n p r, providing improved load balancing and coalesced memory access, and hence more efficient SpMV computations on GPUs. Therefore, SURAA minimizes the adverse effects of the n p r variance by uniformly distributing the load using equal sized segments. We implement the SURAA method as a tool and compare its performance with the de facto best commercial (cuSPARSE) and open source (CUSP, MAGMA) tools using widely used benchmarks comprising 26 high n p r v a r i a n c e matrices from 13 diverse domains. SURAA outperforms the other tools by delivering 13.99x speedup on average. We believe that our approach provides a fundamental shift in addressing SpMV related challenges on GPUs including coalesced memory access, thread divergence, and load balancing, and is set to open new avenues for further improving SpMV performance in the future.

Funder

Deanship of Scientific Research (DSR), King Abdulaziz University

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/9/5/947/pdf

Reference88 articles.

1. The Landscape of Parallel Computing Research: A View from Berkeley;Asanovic,2006

2. Block Conjugate-Gradient Method With Multilevel Preconditioning and GPU Acceleration for FEM Problems in Electromagnetics

3. Parallelization Strategies for Computational Fluid Dynamics Software: State of the Art Review

4. Solving finite-difference equations for diffractive optics problems using graphics processing units

5. Speeding up the high-accuracy surface modelling method with GPU

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimal Scheduling for the Performance Optimization of SpMV Computation using Machine Learning Techniques;2024 7th International Conference on Information and Computer Technologies (ICICT);2024-03-15

2. Revisiting thread configuration of SpMV kernels on GPU: A machine learning based approach;Journal of Parallel and Distributed Computing;2024-03

3. MANet: An Architecture Adaptive Method for Sparse Matrix Format Selection;Lecture Notes in Computer Science;2024

4. Optimization Techniques for GPU Programming;ACM Computing Surveys;2023-03-16

5. GPU Sparse Matrix Vector Multiplication Optimization Based on ELLB Storage Format;Proceedings of the 2023 12th International Conference on Software and Computer Applications;2023-02-23