Memory Interface Design for 3D Stencil Kernels on a Massively Parallel Memory System-Reference-Cited by-同舟云学术

Memory Interface Design for 3D Stencil Kernels on a Massively Parallel Memory System

Published:2015-10 Issue:4 Volume:8 Page:1-24
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Jin Zheming¹,Bakos Jason D.¹

Affiliation:

1. University of South Carolina, Columbia

Abstract

Massively parallel memory systems are designed to deliver high bandwidth at relatively low clock speed for memory-intensive applications implemented on programmable logic. For example, the Convey HC-1 provides 1,024 DRAM banks to each of four FPGAs through a full crossbar, presenting a peak bandwidth of 76.8GB/s to the user logic. Such highly parallel memory systems suffer from high latency, and their effective bandwidth is highly sensitive to access ordering. To achieve high performance, the user must use a customized memory interface that combines scheduling, latency hiding, and data reuse. In this article, we describe the design of a custom memory interface for 3D stencil kernels on the Convey HC-1 that incorporates these features. Experimental results show that the proposed memory interface achieves a speedup in runtime of 2.2 for 6-point stencil and 9.5 for 27-point stencil when compared to a naive memory interface.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2800788

Reference28 articles.

1. Future scaling of processor-memory interfaces

2. Scratchpad memory

3. A compiler approach to managing storage and memory bandwidth in configurable architectures

4. Automatic memory partitioning

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient Homomorphic Convolution Designs on FPGA for Secure Inference;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2022-11

2. DCMI;ACM Transactions on Architecture and Code Optimization;2019-12-31