Sparsity: Optimization Framework for Sparse Matrix Kernels-Reference-Cited by-同舟云学术

Sparsity: Optimization Framework for Sparse Matrix Kernels

Published:2004-02 Issue:1 Volume:18 Page:135-158
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Im Eun-Jin¹,Yelick Katherine,Vuduc Richard²

Affiliation:

1. SCHOOL OF COMPUTER SCIENCE KOOKMIN UNIVERSITY, SEOUL, KOREA

2. COMPUTER SCIENCE DIVISION UNIVERSITY OF CALIFORNIA, BERKELEY, CA, USA

Abstract

Sparse matrix–vector multiplication is an important computational kernel that performs poorly on most modern processors due to a low compute-to-memory ratio and irregular memory access patterns. Optimization is difficult because of the complexity of cache-based memory systems and because performance is highly dependent on the non-zero structure of the matrix. The SPARSITY system is designed to address these problems by allowing users to automatically build sparse matrix kernels that are tuned to their matrices and machines. SPARSITY combines traditional techniques such as loop transformations with data structure transformations and optimization heuristics that are specific to sparse matrices. It provides a novel framework for selecting optimization parameters, such as block size, using a combination of performance models and search. In this paper we discuss the optimization of two operations: a sparse matrix times a dense vector and a sparse matrix times a set of dense vectors. Our experience indicates that register level optimizations are effective for matrices arising in certain scientific simulations, in particular finite-element problems. Cache level optimizations are important when the vector used in multiplication is larger than the cache size, especially for matrices in which the non-zero structure is random. For applications involving multiple vectors, reorganizing the computation to perform the entire set of multiplications as a single operation produces significant speedups. We describe the different optimizations and parameter selection techniques and evaluate them on several machines using over 40 matrices taken from a broad set of application domains. Our results demonstrate speedups of up to 4× for the single vector case and up to 10× for the multiple vector case.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342004041296

Reference5 articles.

1. Using Linear Algebra for Intelligent Information Retrieval

2. A Shifted Block Lanczos Algorithm for Solving Sparse Symmetric Generalized Eigenproblems

3. Development of a block Lanczos algorithm for free vibration analysis of spinning structures

4. A block Arnoldi-Chebyshev method for computing the leading eigenpairs of large sparse unsymmetric matrices

5. Block-Arnoldi and Davidson methods for unsymmetric large eigenvalue problems

Cited by 189 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SpChar: Characterizing the sparse puzzle via decision trees;Journal of Parallel and Distributed Computing;2024-10

2. Dedicated Hardware Accelerators for Processing of Sparse Matrices and Vectors: A Survey;ACM Transactions on Architecture and Code Optimization;2024-02-15

3. Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUs;Lecture Notes in Computer Science;2024

4. SPC5: An efficient SpMV framework vectorized using ARM SVE and x86 AVX-512;Computer Science and Information Systems;2024

5. Efficiently Running SpMV on Multi-Core DSPs for Block Sparse Matrix;2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS);2023-12-17