Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods-Reference-Cited by-同舟云学术

Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods

Published:2011-08 Issue:3 Volume:4 Page:1-14
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Boland David¹,Constantinides George A.¹

Affiliation:

1. Imperial College London, UK

Abstract

Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [Morris et al. 2006; Zhang et al. 2008; Zhuo and Prasanna 2006]. One class of algorithms to solve these systems, iterative methods, has drawn particular interest, with recent literature showing large performance improvements over General-Purpose Processors (GPPs) [Lopes and Constantinides 2008]. In several iterative methods, this performance gain is largely a result of parallelization of the matrix-vector multiplication, an operation that occurs in many applications and hence has also been widely studied on FPGAs [Zhuo and Prasanna 2005; El-Kurdi et al. 2006]. However, whilst the performance of matrix-vector multiplication on FPGAs is generally I/O bound [Zhuo and Prasanna 2005], the nature of iterative methods allows the use of on-chip memory buffers to increase the bandwidth, providing the potential for significantly more parallelism [deLorimier and DeHon 2005]. Unfortunately, existing approaches have generally only either been capable of solving large matrices with limited improvement over GPPs [Zhuo and Prasanna 2005; El-Kurdi et al. 2006; deLorimier and DeHon 2005], or achieve high performance for relatively small matrices [Lopes and Constantinides 2008; Boland and Constantinides 2008]. This article proposes hardware designs to take advantage of symmetrical and banded matrix structure, as well as methods to optimize the RAM use, in order to both increase the performance and retain this performance for larger-order matrices.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2000832.2000834

Reference19 articles.

1. Barrett R. Berry M. Chan T. F. Demmel J. Donato J. Dongarra J. Eijkhout V. Pozo R. Romine C. and der Vorst H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods 2nd Ed. SIAM Philadelphia PA. Barrett R. Berry M. Chan T. F. Demmel J. Donato J. Dongarra J. Eijkhout V. Pozo R. Romine C. and der Vorst H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods 2nd Ed. SIAM Philadelphia PA.

2. Optimising Memory Bandwidth Use for Matrix-Vector Multiplication in Iterative Methods

3. Floating-point sparse matrix-vector multiply for FPGAs

4. Sparse Matrix-Vector Multiplication for Finite Element Method Matrices on FPGAs

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hybrid CPU-GPU solution to regularized divergence-free curl-curl equations for electromagnetic inversion problems;Computers & Geosciences;2024-02

2. Mixed and Multi-Precision SpMV for GPUs with Row-wise Precision Selection;2022 IEEE 34th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD);2022-11

3. Performance Analysis of Sparse Matrix-Vector Multiplication (SpMV) on Graphics Processing Units (GPUs);Electronics;2020-10-13

4. Nonlinear predictive control on a heterogeneous computing platform;Control Engineering Practice;2018-09

5. Nonlinear predictive control on a heterogeneous computing platform;IFAC-PapersOnLine;2017-07