Software pipelining for graphic processing unit acceleration: Partition, scheduling and granularity-Reference-Cited by-同舟云学术

Software pipelining for graphic processing unit acceleration: Partition, scheduling and granularity

Published:2015-06-02 Issue:2 Volume:30 Page:169-185
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Liu Bozhong¹,Qiu Weidong¹,Jiang Lin¹,Gong Zheng²

Affiliation:

1. School of Information Security Engineering, Shanghai Jiao Tong University, China

2. School of Computer Science, South China Normal University, China

Abstract

The graphic processing unit (GPU) is becoming increasingly popular as a performance accelerator in various applications requiring high-performance parallel computing capability. In a central processing unit (CPU) or GPU hybrid system, software pipelining is a major task in order to deliver accelerated performance, where hiding CPU–GPU communication overheads by splitting a large task into small units is the key challenge. In this paper, we carry out a systematic investigation into task partitioning in order to achieve maximum performance gain. We first validate the advantage of even partition strategy, and then propose the optimal scheduling, with detailed study into how to achieve optimal unit size (data granularity) in an analytical framework. Experiments on AMD and NVIDIA GPU platforms demonstrate that our approaches achieve around 31 – 59% performance improvement using software pipelining.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342015585845

Reference41 articles.

1. Resource-constrained software pipelining

2. A High-Performance Implementation of Differential Power Analysis on Graphics Cards

3. Improving GPU Performance Prediction with Data Transfer Modeling

4. Brook for GPUs

5. Parallelizing SOR for GPGPUs using alternate loop tiling

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems;ACM Transactions on Architecture and Code Optimization;2023-12-14

2. Large-Scale Simulation of Structural Dynamics Computing on GPU Clusters;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2023-11-11

3. Comprehensive Survey of Consensus Docking for High-Throughput Virtual Screening;Molecules;2022-12-25

4. An Automatic Pipeline Parallel Acceleration Framework for Neural Network Models on Heterogeneous Computing Platforms;2022 5th International Conference on Pattern Recognition and Artificial Intelligence (PRAI);2022-08-19

5. CoCoPeLia: Communication-Computation Overlap Prediction for Efficient Linear Algebra on GPUs;2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS);2021-03