Modeling and Analyzing Evaluation Cost of CUDA Kernels-Reference-Cited by-同舟云学术

Modeling and Analyzing Evaluation Cost of CUDA Kernels

Published:2024-03-12 Issue:1 Volume:11 Page:1-53
ISSN:2329-4949
Container-title:ACM Transactions on Parallel Computing
language:en
Short-container-title:ACM Trans. Parallel Comput.

Author:

Muller Stefan K.¹^ORCID,Hoffmann Jan²^ORCID

Affiliation:

1. Illinois Institute of Technology, Chicago, USA

2. Carnegie Mellon University, Pittsburgh, USA

Abstract

Motivated by the increasing importance of general-purpose Graphic Processing Units (GPGPU) programming, exemplified by NVIDIA’s CUDA framework, as well as the difficulty, especially for novice programmers, of reasoning about performance in GPGPU kernels, we introduce a novel quantitative program logic for CUDA kernels. The logic allows programmers to reason about both functional correctness and resource usage of CUDA kernels, paying particular attention to a set of common but CUDA-specific performance bottlenecks: warp divergences, uncoalesced memory accesses, and bank conflicts. The logic is proved sound with respect to a novel operational cost semantics for CUDA kernels. The semantics, logic, and soundness proofs are formalized in Coq. An inference algorithm based on LP solving automatically synthesizes symbolic resource bounds by generating derivations in the logic. This algorithm is the basis of RaCUDA, an end-to-end resource-analysis tool for kernels, which has been implemented using an existing resource-analysis tool for imperative programs. An experimental evaluation on a suite of benchmarks shows that the analysis is effective in aiding the detection of performance bugs in CUDA kernels.

Funder

United States Air Force and DARPA

National Science Foundation

SHF

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3639403

Reference46 articles.

1. Cost Analysis of Concurrent OO Programs

2. GPUDrano: Detecting Uncoalesced Accesses in GPU Programs

3. Amortised Resource Analysis with Separation Logic

4. Benchmarking the Cost of Thread Divergence in CUDA

5. Parallelism in sequential functional languages

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Modeling and Analyzing Evaluation Cost of CUDA Kernels;ACM Transactions on Parallel Computing;2024-03-12

2. An Innovative Exploration of Deep Learning for Pesticide and Veterinary Drug Development: A Molecular Generative Model Based on Scaffold Structure Mha-Rnn;2024