Flextended Tiles-Reference-Cited by-同舟云学术

Flextended Tiles

Published:2019-12-31 Issue:4 Volume:16 Page:1-25
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Zhao Jie¹,Cohen Albert²^ORCID

Affiliation:

1. INRIA 8 DI, École Normale Supérieure, Paris, France

2. Google, Paris, France

Abstract

Loop tiling to exploit data locality and parallelism plays an essential role in a variety of general-purpose and domain-specific compilers. Affine transformations in polyhedral frameworks implement classical forms of rectangular and parallelogram tiling, but these lead to pipelined start with rather inefficient wavefront parallelism. Multiple extensions to polyhedral compilers evaluated sophisticated shapes such as trapezoid or diamond tiles, enabling concurrent start along the axes of the iteration space; yet these resort to custom schedulers and code generators insufficiently integrated within the general framework. One of these modified shapes referred to as overlapped tiling also lacks a unifying framework to reason about its composition with affine transformations; this prevents its application in general-purpose loop-nest optimizers and the fair comparison with other techniques. We revisit overlapped tiling, recasting it as an affine transformation on schedule trees composable with any affine scheduling algorithm. We demonstrate how to derive tighter tile shapes with less redundant computations. Our method models the traditional “scalene trapezoid” shapes and novel “right-rectangle” variants. It goes beyond the state of the art by avoiding the restriction to a domain-specific language or introducing post-pass rescheduling and custom code generation. We conduct experiments on the PolyMage benchmarks and iterated stencils, validating the effectiveness and applicability of our technique on both general-purpose multicores and GPU accelerators.

Funder

National Natural Science Foundation of China

European Commission through the MNEMOSENE project

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3369382

Reference38 articles.

1. OpenTuner

2. Fast Local Laplacian Filters

3. Tiling and optimizing time-iterated computations on periodic domains

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for Halide;IEEE Access;2024

2. Enhancing Programs Efficiency through a Machine Learning-Based Model for Tile Size Selection;BIO Web of Conferences;2024

3. Modeling the Interplay between Loop Tiling and Fusion in Optimizing Compilers Using Affine Relations;ACM Transactions on Computer Systems;2023-11-30

4. DHTS: A Dynamic Hybrid Tiling Strategy for Optimizing Stencil Computation on GPUs;IEEE Transactions on Computers;2023-10

5. A Methodology for Efficient Tile Size Selection for Affine Loop Kernels;International Journal of Parallel Programming;2022-05-23