RIPL-Reference-Cited by-同舟云学术

RIPL

Published:2018-03-31 Issue:1 Volume:11 Page:1-24
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Stewart Robert¹^ORCID,Duncan Kirsty¹,Michaelson Greg¹,Garcia Paulo¹,Bhowmik Deepayan²,Wallace Andrew¹

Affiliation:

1. Heriot-Watt University, Edinburgh, UK

2. Sheffield Hallam University, Sheffield, UK

Abstract

Specialized FPGA implementations can deliver higher performance and greater power efficiency than embedded CPU or GPU implementations for real-time image processing. Programming challenges limit their wider use, because the implementation of FPGA architectures at the register transfer level is time consuming and error prone. Existing software languages supported by high-level synthesis (HLS), although providing a productivity improvement, are too general purpose to generate efficient hardware without the use of hardware-specific code optimizations. Such optimizations leak hardware details into the abstractions that software languages are there to provide, and they require knowledge of FPGAs to generate efficient hardware, such as by using language pragmas to partition data structures across memory blocks. This article presents a thorough account of the Rathlin image processing language (RIPL), a high-level image processing domain-specific language for FPGAs. We motivate its design, based on higher-order algorithmic skeletons, with requirements from the image processing domain. RIPL’s skeletons suffice to elegantly describe image processing stencils, as well as recursive algorithms with nonlocal random access patterns. At its core, RIPL employs a dataflow intermediate representation. We give a formal account of the compilation scheme from RIPL skeletons to static and cyclostatic dataflow models to describe their data rates and static scheduling on FPGAs. RIPL compares favorably to the Vivado HLS OpenCV library and C++ compiled with Vivado HLS. RIPL achieves between 54 and 191 frames per second (FPS) at 100MHz for four synthetic benchmarks, faster than HLS OpenCV in three cases. Two real-world algorithms are implemented in RIPL: visual saliency and mean shift segmentation. For the visual saliency algorithm, RIPL achieves 71 FPS compared to optimized C++ at 28 FPS. RIPL is also concise, being 5x shorter than C++ and 111x shorter than an equivalent direct dataflow implementation. For mean shift segmentation, RIPL achieves 7 FPS compared to optimized C++ on 64 CPU cores at 1.1, and RIPL is 10x shorter than the direct dataflow FPGA implementation.

Funder

EPSRC

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3180481

Reference44 articles.

1. A 16-nm Multiprocessing System-on-Chip Field-Programmable Gate Array Platform

2. Programming models for hybrid FPGA-cpu computational components: a missing link

3. Power efficient dataflow design for a heterogeneous smart camera architecture

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Allo: A Programming Model for Composable Accelerator Design;Proceedings of the ACM on Programming Languages;2024-06-20

2. SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for Halide;IEEE Access;2024

3. The Good, the Bad and the Ugly: Practices and Perspectives on Hardware Acceleration for Embedded Image Processing;Journal of Signal Processing Systems;2023-07-29

4. ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators;Proceedings of the 50th Annual International Symposium on Computer Architecture;2023-06-17

5. HDLRuby: A Ruby Extension for Hardware Description and its Translation to Synthesizable Verilog HDL;ACM Transactions on Embedded Computing Systems;2023-02