FPGA-Array with Bandwidth-Reduction Mechanism for Scalable and Power-Efficient Numerical Simulations Based on Finite Difference Methods

Author:

Sano Kentaro1,Luzhou Wang1,Hatsuda Yoshiaki1,Iizuka Takanori1,Yamamoto Satoru1

Affiliation:

1. Tohoku University

Abstract

For scientific numerical simulation that requires a relatively high ratio of data access to computation, the scalability of memory bandwidth is the key to performance improvement, and therefore custom-computing machines (CCMs) are one of the promising approaches to provide bandwidth-aware structures tailored for individual applications. In this article, we propose a scalable FPGA-array with bandwidth-reduction mechanism (BRM) to implement high-performance and power-efficient CCMs for scientific simulations based on finite difference methods. With the FPGA-array, we construct a systolic computational-memory array (SCMA), which is given a minimum of programmability to provide flexibility and high productivity for various computing kernels and boundary computations. Since the systolic computational-memory architecture of SCMA provides scalability of both memory bandwidth and arithmetic performance according to the array size, we introduce a homogeneously partitioning approach to the SCMA so that it is extensible over a 1D or 2D array of FPGAs connected with a mesh network. To satisfy the bandwidth requirement of inter-FPGA communication, we propose BRM based on time-division multiplexing. BRM decreases the required number of communication channels between the adjacent FPGAs at the cost of delay cycles. We formulate the trade-off between bandwidth and delay of inter-FPGA data-transfer with BRM. To demonstrate feasibility and evaluate performance quantitatively, we design and implement the SCMA of 192 processing elements over two ALTERA Stratix II FPGAs. The implemented SCMA running at 106MHz has the peak performance of 40.7 GFlops in single precision. We demonstrate that the SCMA achieves the sustained performances of 32.8 to 35.7 GFlops for three benchmark computations with high utilization of computing units. The SCMA has complete scalability to the increasing number of FPGAs due to the highly localized computation and communication. In addition, we also demonstrate that the FPGA-based SCMA is power-efficient: it consumes 69% to 87% power and requires only 2.8% to 7.0% energy of those for the same computations performed by a 3.4-GHz Pentium4 processor. With software simulation, we show that BRM works effectively for benchmark computations, and therefore commercially available low-end FPGAs with relatively narrow I/O bandwidth can be utilized to construct a scalable FPGA-array.

Funder

Ministry of Education, Culture, Sports, Science, and Technology

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference53 articles.

1. Altera Corporation. 2008. http://www.altera.com/literature/. Altera Corporation . 2008. http://www.altera.com/literature/.

2. An FPGA implementation of the two-dimensional finite-difference time-domain (FDTD) algorithm

3. Reconfigurable computing

4. Floating-point sparse matrix-vector multiply for FPGAs

Cited by 21 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Scalable Many-core Overlay Architecture on an HBM2-enabled Multi-Die FPGA;ACM Transactions on Reconfigurable Technology and Systems;2023-01-18

2. Packed SIMD Vectorization of the DRAGON2-CB;2022 IEEE 15th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC);2022-12

3. An efficient FPGA overlay for MPI-2 RMA parallel applications;2022 20th IEEE Interregional NEWCAS Conference (NEWCAS);2022-06-19

4. A Highly-Efficient and Tightly-Connected Many-Core Overlay Architecture;IEEE Access;2021

5. Performance Analysis of Hardware-Based Numerical Data Compression on Various Data Formats;2018 Data Compression Conference;2018-03

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3