Affiliation:
1. RIKEN Center for Computational Science, Japan
2. Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria, Italy
Abstract
Stencil-based applications play an essential role in high-performance systems as they occur in numerous computational areas, such as partial differential equation solving. In this context, Iterative Stencil Loops (ISLs) represent a prominent and well-known algorithmic class within the stencil domain. Specifically, ISL-based calculations iteratively apply the same stencil to a multi-dimensional point grid multiple times or until convergence. However, due to their iterative and intensive nature, ISLs are highly performance-hungry, demanding specialized solutions. Here, Field Programmable Gate Arrays (FPGAs) represent a valid architectural choice as they enable the design of custom, parallel, and scalable ISL accelerators. Besides, the regular structure of ISLs makes them an ideal candidate for automatic optimization and generation flows. For these reasons, this paper introduces
Senju
, an automation framework for the design of highly parallel ISL accelerators targeting single-/multi-FPGA systems. Given an input description,
Senju
automates the entire design process and provides accurate performance estimations. The experimental evaluation shows remarkable and scalable results, outperforming single- and multi-FPGA literature approaches under different metrics. Finally, we present a new analysis of temporal and spatial parallelism trade-offs in a real-case scenario and discuss our performance through a single- and novel specialized multi-FPGA formulation of the Roofline Model.
Publisher
Association for Computing Machinery (ACM)
Reference60 articles.
1. AMD. 2023. Versal Premium Series VPK120 Evaluation Kit. https://www.xilinx.com/products/boards-and-kits/vpk120.html AMD. 2023. Versal Premium Series VPK120 Evaluation Kit. https://www.xilinx.com/products/boards-and-kits/vpk120.html
2. Tiling stencil computations to maximize parallelism
3. Uday Bondhugula. 2008. PLUTO Compiler - Examples. https://github.com/bondhugula/pluto/tree/master/examples. Uday Bondhugula. 2008. PLUTO Compiler - Examples. https://github.com/bondhugula/pluto/tree/master/examples.
4. A practical automatic polyhedral parallelizer and locality optimizer
5. Nonequilibrium molecular dynamics simulation of shear viscosity by a uniform momentum source-and-sink scheme
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Exploration of Trade-offs Between General-Purpose and Specialized Processing Elements in HPC-Oriented CGRA;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27
2. Flexible Systolic Array Platform on Virtual 2-D Multi-FPGA Plane;Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region;2024-01-18