Affiliation:
1. University of Central Florida, USA
Abstract
High-level synthesis (HLS) with FPGA can achieve significant performance improvements through effective memory partitioning and meticulous data reuse. In this chapter, the authors will first explore techniques that have been adopted directly from systems that possess a fixed memory subsystem such as CPUs and GPUs (Section 2). Section 3 will focus on techniques that have been developed specifically for reconfigurable architectures which generate custom memory subsystems to take advantage of the peculiarities of a family of affine code called stencil code. The authors will focus on techniques that exploit memory banking to allow for parallel, conflict-free memory accesses in Section 3.1 and techniques that generate an optimal memory micro-architecture for data reuse in Section 3.2. Finally, Section 4 will explore the technique handling code still belonging to the affine family but the relative distance between the addresses.
Reference35 articles.
1. Baradaran & Diniz. (2008). A compiler approach to managing storage and memory bandwidth in configurable architectures. ACM Trans. Des. Autom. Electron. Syst., 13, 61:1–61:26.
2. CoRAM
3. Cilardo & Gallo. (2015). Improving multibank memory access parallelism with lattice-based partitioning. ACM Trans. Archit. Code Optim., 11, 45:1–45:25.
4. Interplay of Loop Unrolling and Multidimensional Memory Partitioning in HLS
5. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers