A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs-Reference-Cited by-同舟云学术

A Strategy for Automatic Performance Tuning of Stencil Computations on GPUs

Published:2018-05-28 Issue: Volume:2018 Page:1-24
ISSN:1058-9244
Container-title:Scientific Programming
language:en
Short-container-title:Scientific Programming

Author:

Garvey Joseph D.¹^ORCID,Abdelrahman Tarek S.¹^ORCID

Affiliation:

1. Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Toronto, ON, Canada M5S 3G4

Abstract

We propose and evaluate a novel strategy for tuning the performance of a class of stencil computations on Graphics Processing Units. The strategy uses a machine learning model to predict the optimal way to load data from memory followed by a heuristic that divides other optimizations into groups and exhaustively explores one group at a time. We use a set of 104 synthetic OpenCL stencil benchmarks that are representative of many real stencil computations. We first demonstrate the need for auto-tuning by showing that the optimization space is sufficiently complex that simple approaches to determining a high-performing configuration fail. We then demonstrate the effectiveness of our approach on NVIDIA and AMD GPUs. Relative to a random sampling of the space, we find configurations that are 12%/32% faster on the NVIDIA/AMD platform in 71% and 4% less time, respectively. Relative to an expert search, we achieve 5% and 9% better performance on the two platforms in 89% and 76% less time. We also evaluate our strategy for different stencil computational intensities, varying array sizes and shapes, and in combination with expert search.

Funder

Natural Sciences and Engineering Research Council of Canada

Publisher

Hindawi Limited

Subject

Computer Science Applications,Software

Link

http://downloads.hindawi.com/journals/sp/2018/6093054.pdf

Reference9 articles.

1. Stencil-Aware GPU Optimization of Iterative Solvers

2. An investigation of the efficient implementation of cellular automata on multi-core CPU and GPU hardware

3. Iterative methods for solving partial difference equations of elliptic type

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Incremental Auto-Tuning for Hybrid Parallelization Using OpenCL;2023 IEEE 12th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS);2023-09-07

2. Optimization Techniques for GPU Programming;ACM Computing Surveys;2023-03-16

3. Using Compiler Directives for Performance Portability in Scientific Computing: Kernels from Molecular Simulation;Accelerator Programming Using Directives;2019