Symbolic Multi-Level Loop Mapping of Loop Programs for Massively Parallel Processor Arrays-Reference-Cited by-同舟云学术

Symbolic Multi-Level Loop Mapping of Loop Programs for Massively Parallel Processor Arrays

Published:2018-03-31 Issue:2 Volume:17 Page:1-27
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Tanase Alexandru¹,Witterauf Michael¹,Teich Jürgen¹,Hannig Frank¹

Affiliation:

1. Hardware/Software Co-Design, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Germany

Abstract

Today’s MPSoCs (multiprocessor systems-on-chip) have brought up massively parallel processor array accelerators that may achieve a high computational efficiency by exploiting multiple levels of parallelism and different memory hierarchies. Such parallel processor arrays are perfect targets, particularly for the acceleration of nested loop programs due to their regular and massively parallel nature. However, existing loop parallelization techniques are often unable to exploit multiple levels of parallelism and are either I/O or memory bounded. Furthermore, if the number of available processing elements becomes only known at runtime—as in adaptive systems—static approaches fail. In this article, we solve some of these problems by proposing a hybrid compile/runtime multi-level symbolic parallelization technique that is able to: (a) exploit multiple levels of parallelism as well as (b) different memory hierarchies, and (c) to match the I/O or memory capabilities of the target architecture for scenarios where the number of available processing elements is only known at runtime. Our proposed technique consists of two compile-time transformations: (a) symbolic hierarchical tiling followed by (b) symbolic multi-level scheduling. The tiling levels scheduled in parallel exploit different levels of parallelism, whereas the sequential one, different memory hierarchies. Furthermore, by tuning the size of the tiles on the individual levels, a tradeoff between the necessary I/O-bandwidth and memory is possible, which facilitates obeying resource constraints. The resulting schedules are symbolic with respect to the problem size and tile sizes. Thus, the number of processing elements to map onto does not need to be known at compile time. At runtime, when the number of available processors becomes known, a simple prologue chooses a feasible schedule with respect to I/O and memory constraints that is latency-optimal for the chosen tile size. In summary, our approach determines the set of feasible, latency-optimal symbolic loop schedule candidates at compile time, from which one is dynamically selected at runtime. This approach exploits multiple levels of parallelism, is independent of the problem size of the loop nest, and thereby avoids any expensive re-compilation at runtime. This is particularly important for low cost and memory-scarce embedded MPSoC platforms that may not afford to host a just-in-time compiler.

Funder

German Research Foundation (DFG) as part of the Transregional Collaborative Research Center “Invasive Computing”

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3092952

Reference31 articles.

1. Linear scheduling is close to optimality

2. Affine-by-Statement Scheduling of Uniform and Affine Loop Nests over Parametric Domains

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. *‐Predictable MPSoC execution of real‐time control applications using invasive computing;Concurrency and Computation: Practice and Experience;2019-02-03

2. InvadeSIM-A Simulation Framework for Invasive Parallel Programs and Architectures;Computer Architecture and Design Methodologies;2019