The Pluto+ Algorithm

Author:

Bondhugula Uday1,Acharya Aravind1,Cohen Albert2

Affiliation:

1. Indian Institute of Science, Bangalore, India

2. INRIA and ENS, Paris, France

Abstract

Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks such as the Pluto algorithm, which include a cost function for modern multicore architectures for which coarse-grained parallelism and locality are crucial, consider only a subspace of transformations to avoid a combinatorial explosion in finding transformations. The ensuing practical trade-offs lead to the exclusion of certain useful transformations: in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In addition, there is currently no proof that the algorithm successfully finds a tree of permutable loop bands for all affine loop nests. In this article, we propose an approach to address these two issues (1) by modeling a much larger space of practically useful affine transformations in conjunction with the existing cost function of Pluto, and (2) by extending the Pluto algorithm in a way that allows a proof for its soundness and completeness for all affine loop nests. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance for any benchmark from Polybench. For the Lattice Boltzmann Method (LBM) simulations with periodic boundary conditions, it provides a mean speedup of 1.33 × over Pluto. We also show that Pluto+ does not increase compilation time significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time by only 15%. In cases in which it improves execution time significantly, it increased polyhedral optimization time by only 2.04 × .

Funder

INRIA Associate Team PolyFlow

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Cited by 37 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. DNNOPT: A Framework for Efficiently Selecting On-chip Memory Loop Optimizations of DNN Accelerators;Proceedings of the 21st ACM International Conference on Computing Frontiers;2024-05-07

2. Optimizing Deep Learning Inference via Global Analysis and Tensor Expressions;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1;2024-04-17

3. Energy-Aware Tile Size Selection for Affine Programs on GPUs;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02

4. PolyTOPS: Reconfigurable and Flexible Polyhedral Scheduler;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02

5. Fast American Option Pricing using Nonlinear Stencils;Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2024-02-20

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3