Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLS-Reference-Cited by-同舟云学术

Improving Loop Parallelization by a Combination of Static and Dynamic Analyses in HLS

Published:2022-02-04 Issue:3 Volume:15 Page:1-31
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Dewald Florian¹^ORCID,Rohde Johanna²^ORCID,Hochberger Christian²^ORCID,Mantel Heiko¹^ORCID

Affiliation:

1. MAIS chair, Dept. of Computer Science - TU Darmstadt, Hochschulstraße, Darmstadt, Germany

2. Computer Systems Group - TU Darmstadt, Merckstraße, Darmstadt, Germany

Abstract

High-level synthesis (HLS) can be used to create hardware accelerators for compute-intense software parts such as loop structures. Usually, this process requires significant amount of user interaction to steer kernel selection and optimizations. This can be tedious and time-consuming. In this article, we present an approach that fully autonomously finds independent loop iterations and reductions to create parallelized accelerators. We combine static analysis with information available only at runtime to maximize the parallelism exploited by the created accelerators. For loops where we see potential for parallelism, we create fully parallelized kernel implementations. If static information does not suffice to deduce independence, then we assume independence at compile time. We verify this assumption by statically created checks that are dynamically evaluated at runtime, before using the optimized kernel. Evaluating our approach, we can generate speedups for five out of seven benchmarks. With four loop iterations running in parallel, we achieve ideal speedups of up to 4× and on average speedups of 2.27×, both in comparison to an unoptimized accelerator.

Funder

Hessian LOEWE initiative within the Software-Factory 4.0 project

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3501801

Reference27 articles.

1. Automatic translation of FORTRAN programs to vector form

2. LegUp High-Level Synthesis;Anderson Jason;R,2020

3. Dissecting sequential programs for parallelization—An approach based on computational units

4. From software threads to parallel hardware in high-level synthesis for FPGAs