FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-art to Future Opportunities-Reference-Cited by-同舟云学术

FPGA-Based Sparse Matrix Multiplication Accelerators: From State-of-the-art to Future Opportunities

Published:2024-08-28 Issue: Volume: Page:
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Liu Yajing¹^ORCID,Chen Ruiqi²^ORCID,Li Shuyang³^ORCID,Yang Jing¹^ORCID,Li Shun⁴^ORCID,Silva Bruno da²^ORCID

Affiliation:

1. Fuzhou University, China

2. Vrije University Brussel, Belgium

3. Fudan University, China

4. Fuzhou University, China and VeriMake Innovation Lab, China

Abstract

Sparse matrix multiplication (SpMM) plays a critical role in high-performance computing applications, such as deep learning, image processing, and physical simulation. Field-Programmable Gate Arrays (FPGAs), with their configurable hardware resources, can be tailored to accelerate SpMMs. There has been considerable research on deploying sparse matrix multipliers across various FPGA platforms. However, the FPGA-based design of sparse matrix multipliers still presents numerous challenges. Therefore, it is necessary to summarize and organize the current work to provide a reference for further research. This paper first introduces the computational method of SpMM, and categorizes the different challenges of FPGA deployment. Following this, we introduce and analyze a variety of state-of-the-art FPGA-based accelerators tailored for SpMMs. In addition, a comparative analysis of these accelerators is performed, examining metrics including compression rate, throughput, and resource utilization. Finally, we propose potential research directions and challenges for further study of FPGA-based SpMM acclerators.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3687480

Reference71 articles.

1. Leon Adams and Strategic Marketing. 2002. Choosing the right architecture for real-time signal processing designs. Texas Instruments, Dallas, Texas, USA.

2. Ariful Azad, Aydin Buluç, and John Gilbert. 2015. Parallel Triangle Counting and Enumeration Using Matrix Algebra. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop. IEEE, Hyderabad, India, 804–811. https://doi.org/10.1109/IPDPSW.2015.75

3. The Combinatorial BLAS: design, implementation, and applications

4. Ruiqi Chen, Haoyang Zhang, Shun Li, Enhao Tang, Jun Yu, and Kun Wang. 2023. Graph-OPU: A Highly Integrated FPGA-Based Overlay Processor for Graph Neural Networks. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, Gothenburg, Sweden, 228–234. https://doi.org/10.1109/FPL60245.2023.00039

5. Ruiqi Chen, Haoyang Zhang, Yuhanxiao Ma, Jianli Chen, Jun Yu, and Kun Wang. 2023. eSSpMV: An Embedded-FPGA-based Hardware Accelerator for Symmetric Sparse Matrix-Vector Multiplication. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, Monterey, CA, USA, 1–5. https://doi.org/10.1109/ISCAS46773.2023.10181734