Affiliation:
1. The Ohio State University, Columbus, OH
2. Lousiana State University, Baton Rouge, LA
Abstract
Performance optimization of stencil computations has been widely studied in the literature, since they occur in many computationally intensive scientific and engineering applications. Compiler frameworks have also been developed that can transform sequential stencil codes for optimization of data locality and parallelism. However, loop skewing is typically required in order to tile stencil codes along the time dimension, resulting in load imbalance in pipelined parallel execution of the tiles. In this paper, we develop an approach for automatic parallelization of stencil codes, that explicitly addresses the issue of load-balanced execution of tiles. Experimental results are provided that demonstrate the effectiveness of the approach.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
58 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Bricks: A high-performance portability layer for computations on block-structured grids;The International Journal of High Performance Computing Applications;2024-08-19
2. Stencil Computation with Vector Outer Product;Proceedings of the 38th ACM International Conference on Supercomputing;2024-05-30
3. Fast American Option Pricing using Nonlinear Stencils;Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2024-02-20
4. SlidingConv: Domain-Specific Description of Sliding Discrete Cosine Transform Convolution for Halide;IEEE Access;2024
5. Low-Cost Post Hoc Reconstruction of HPC Simulations at Full Resolution;2023 IEEE 13th Symposium on Large Data Analysis and Visualization (LDAV);2023-10-23