Affiliation:
1. Université Grenoble Alpes, CEA, LIST, Grenoble, France
2. Université Grenoble Alpes, CNRS, Grenoble INP, TIMA, Saint-Martin-d'Heres, France
Abstract
Performance in scientific and engineering applications such as computational physics, algebraic graph problems or Convolutional Neural Networks (CNN), is dominated by the manipulation of large sparse matrices—matrices with a large number of zero elements. Specialized software using data formats for sparse matrices has been optimized for the main kernels of interest: SpMV and SpMSpM matrix multiplications, but due to the indirect memory accesses, the performance is still limited by the memory hierarchy of conventional computers. Recent work shows that specific hardware accelerators can reduce memory traffic and improve the execution time of sparse matrix multiplication, compared to the best software implementations. The performance of these sparse hardware accelerators depends on the choice of the sparse format,
COO
,
CSR
, etc, the algorithm,
inner-product
,
outer-product
,
Gustavson
, and many hardware design choices. In this article, we propose a systematic survey which identifies the design choices of state-of-the-art accelerators for sparse matrix multiplication kernels. We introduce the necessary concepts and then present, compare, and classify the main sparse accelerators in the literature, using consistent notations. Finally, we propose a taxonomy for these accelerators to help future designers make the best choices depending on their objectives.
Publisher
Association for Computing Machinery (ACM)
Reference93 articles.
1. Richard Barrett, Michael Berry, Tony F. Chan, James Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Charles Romine, and Henk van der Vorst. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial.
2. James Alfred Ang Brian W. Barrett Kyle Bruce Wheeler and Richard C. Murphy. 2010. Introducing the graph 500. Office of Scientific United States.
3. Bahar Asgari, Ramyad Hadidi, Tushar Krishna, Hyesoon Kim, and Sudhakar Yalamanchili. 2020. ALRESCHA: A lightweight reconfigurable sparse-computation accelerator. In Proceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). 249–260.
4. Venkitesh Ayyar, Evan Weinberg, Richard C. Brower, M .A. Clark, and Mathias Wagner. 2023. Optimizing staggered multigrid for exascale performance. In Proceedings of the 39th International Symposium on Lattice Field Theory — PoS (LATTICE2022). 335.
5. Parallel Triangle Counting and Enumeration Using Matrix Algebra
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SpDCache: Region-Based Reduction Cache for Outer-Product Sparse Matrix Kernels;2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP);2024-07-24
2. Sparm: A Sparse Matrix Multiplication Accelerator Supporting Multiple Dataflows;2024 IEEE 35th International Conference on Application-specific Systems, Architectures and Processors (ASAP);2024-07-24