BestSF-Reference-Cited by-同舟云学术

BestSF

Published:2018-10-08 Issue:3 Volume:15 Page:1-27
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Benatia Akrem¹,Ji Weixing¹,Wang Yizhuo¹,Shi Feng¹

Affiliation:

1. Beijing Institute of Technology, Beijing, China

Abstract

The Sparse Matrix-Vector Multiplication (SpMV) kernel dominates the computing cost in numerous scientific applications. Many implementations based on different sparse formats were proposed to improve this kernel on the recent GPU architectures. However, it has been widely observed that there is no “best-for-all” sparse format for the SpMV kernel on GPU. Indeed, serious performance degradation of an order of magnitude can be observed without a careful selection of the sparse format to use. To address this problem, we propose in this article BestSF (Best Sparse Format), a new learning-based sparse meta-format that automatically selects the most appropriate sparse format for a given input matrix. To do so, BestSF relies on a cost-sensitive classification system trained using Weighted Support Vector Machines (WSVMs) to predict the best sparse format for each input sparse matrix. Our experimental results on two different NVIDIA GPU architectures using a large number of real-world sparse matrices show that BestSF achieved a noticeable overall performance improvement over using a single sparse format. While BestSF is trained to select the best sparse format in terms of performance (GFLOPS), our further experimental investigations revealed that using BestSF also led, in most of the test cases, to the best energy efficiency (MFLOPS/W). To prove its practical effectiveness, we also evaluate the performance and energy efficiency improvement achieved when using BestSF as a building block in a GPU-based Preconditioned Conjugate Gradient (PCG) iterative solver.

Funder

National Key R8D Program of China

National Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3226228

Reference58 articles.

1. Hartwig Anzt Marc Baboulin Jack Dongarra Yvan Fournier Frank Hulsemann Amal Khabou and Yushan Wang. 2016. Accelerating the conjugate gradient algorithm with GPU in CFD simulations. VECPAR. Hartwig Anzt Marc Baboulin Jack Dongarra Yvan Fournier Frank Hulsemann Amal Khabou and Yushan Wang. 2016. Accelerating the conjugate gradient algorithm with GPU in CFD simulations. VECPAR.

2. Hartwig Anzt Mark Gates Jack Dongarra Moritz Kreutzer Gerhard Wellein and Martin Köhler. 2017. Preconditioned Krylov solvers on GPUs. Parallel Comput. (2017). 10.1016/j.parco.2017.05.006 Hartwig Anzt Mark Gates Jack Dongarra Moritz Kreutzer Gerhard Wellein and Martin Köhler. 2017. Preconditioned Krylov solvers on GPUs. Parallel Comput. (2017). 10.1016/j.parco.2017.05.006

3. Energy efficiency and performance frontiers for sparse computations on GPU supercomputers

4. Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications

5. An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs

Cited by 22 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Optimization of Large-Scale Sparse Matrix-Vector Multiplication on Multi-GPU Systems;ACM Transactions on Architecture and Code Optimization;2024-07-08

2. Matrix-free SBP-SAT finite difference methods and the multigrid preconditioner on GPUs;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30

3. Revisiting thread configuration of SpMV kernels on GPU: A machine learning based approach;Journal of Parallel and Distributed Computing;2024-03

4. A Survey of Accelerating Parallel Sparse Linear Algebra;ACM Computing Surveys;2023-08-28

5. An adaptive approach for compression format based on bagging algorithm;International Journal of Parallel, Emergent and Distributed Systems;2023-07-13