Optimal Loop Unrolling and Shifting for Reconfigurable Architectures-Reference-Cited by-同舟云学术

Optimal Loop Unrolling and Shifting for Reconfigurable Architectures

Published:2009-09 Issue:4 Volume:2 Page:1-24
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Dragomir Ozana Silvia¹,Stefanov Todor¹,Bertels Koen¹

Affiliation:

1. TU Delft

Abstract

In this article, we present a new technique for optimizing loops that contain kernels mapped on a reconfigurable fabric. We assume the Molen machine organization as our framework. We propose combining loop unrolling with loop shifting, which is used to relocate the function calls contained in the loop body such that in every iteration of the transformed loop, software functions (running on GPP) execute in parallel with multiple instances of the kernel (running on FPGA). The algorithm computes the optimal unroll factor and determines the most appropriate transformation (which can be the combination of unrolling plus shifting or either of the two). This method is based on profiling information about the kernel’s execution times on GPP and FPGA, memory transfers and area utilization. In the experimental part, we apply this method to several kernels from loop nests extracted from real-life applications (DCT and SAD from MPEG2 encoder, Quantizer from JPEG, and Sobel’s Convolution) and perform an analysis of the results, comparing them with the theoretical maximum speedup by Amdahl’s Law and showing when and how our transformations are beneficial.

Funder

Sixth Framework Programme

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/1575779.1575785

Reference14 articles.

1. PARLGRAN

2. Optimal Unroll Factor for Reconfigurable Architectures

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Minimizing control dependencies of pipelining through optimizing branch selection;2024-04-30

2. FPGA-Based Hardware Implementation of a Stable Inverse Source Problem Algorithm in a Non-Homogeneous Circular Region;Applied Sciences;2024-02-08

3. Loop Unrolling for Energy Efficiency in Low-Cost Field-Programmable Gate Arrays;ACM Transactions on Reconfigurable Technology and Systems;2019-01-29

4. C2FPGA—A dependency-timing graph design methodology;Journal of Parallel and Distributed Computing;2013-11