An optimizing compiler for GPGPU programs with input-data sharing-Reference-Cited by-同舟云学术

An optimizing compiler for GPGPU programs with input-data sharing

Published:2010-05 Issue:5 Volume:45 Page:343-344
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Yang Yi¹,Xiang Ping²,Kong Jingfei²,Zhou Huiyang¹

Affiliation:

1. North Carolina State University, Raleigh, NC, USA

2. University of Central Florida, Orlando, FL, USA

Abstract

Developing high performance GPGPU programs is challenging for application developers since the performance is dependent upon how well the code leverages the hardware features of specific graphics processors. To solve this problem and relieve application developers of low-level hardware-specific optimizations, we introduce a novel compiler to optimize GPGPU programs. Our compiler takes a naive GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler then analyzes the code, identifies memory access patterns, and generates optimized code. The proposed compiler optimizations target at one category of scientific and media processing algorithms, which has the characteristics of input-data sharing when computing neighboring output pixels/elements. Many commonly used algorithms, such as matrix multiplication, convolution, etc., share such characteristics. For these algorithms, novel approaches are proposed to enforce memory coalescing and achieve effective data reuse. Data prefetching and hardware-specific tuning are also performed automatically with our compiler framework. The experimental results based on a set of applications show that our compiler achieves very high performance, either superior or very close to the highly fine-tuned library, NVIDIA CUBLAS 2.1.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1837853.1693505

Reference2 articles.

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Research on Matrix Multiplication Based on the Combination of OpenACC and CUDA;Geo-informatics in Sustainable Ecosystem and Society;2019

2. Parameter based tuning model for optimizing performance on GPU;Cluster Computing;2017-07-01

3. Parameter Tuning Model for Optimizing Application Performance on GPU;2016 IEEE 1st International Workshops on Foundations and Applications of Self* Systems (FAS*W);2016-09

4. Improving branch divergence performance on GPGPU with a new PDOM stack and multi-level warp scheduling;Journal of Systems Architecture;2014-05

5. The Cetus Source-to-Source Compiler Infrastructure: Overview and Evaluation;International Journal of Parallel Programming;2012-08-10