Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems-Reference-Cited by-同舟云学术

Automatic and Portable Mapping of Data Parallel Programs to OpenCL for GPU-Based Heterogeneous Systems

Published:2015-01-09 Issue:4 Volume:11 Page:1-26
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Wang Zheng¹,Grewe Dominik²,O’boyle Michael F. P.²

Affiliation:

1. Lancaster University

2. University of Edinburgh

Abstract

General-purpose GPU-based systems are highly attractive, as they give potentially massive performance at little cost. Realizing such potential is challenging due to the complexity of programming. This article presents a compiler-based approach to automatically generate optimized OpenCL code from data parallel OpenMP programs for GPUs. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses automatic machine learning to build a predictive model to determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multicore host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on distinct GPU-based systems. We achieved average (up to) speedups of 4.51× and 4.20× (143× and 67×) on Core i7/NVIDIA GeForce GTX580 and Core i7/AMD Radeon 7970 platforms, respectively, over a sequential baseline. Our approach achieves, on average, greater than 10× speedups over two state-of-the-art automatic GPU code generators.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2677036

Reference61 articles.

1. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

2. Automatic C-to-CUDA Code Generation for Affine Programs

Cited by 28 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fuzzy Active Learning to Detect OpenCL Kernel Heterogeneous Machines in Cyber Physical Systems;IEEE Transactions on Fuzzy Systems;2022-11

2. Meta-Programming Design-Flow Patterns for Automating Reusable Optimisations;International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies;2022-06-09

3. Machine Learning for CUDA+MPI Design Rules;2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2022-05

4. Benchmarking optimization algorithms for auto-tuning GPU kernels;IEEE Transactions on Evolutionary Computation;2022

5. Optimizing Sparse Matrix Multiplications for Graph Neural Networks;Languages and Compilers for Parallel Computing;2022