Modeling and analyzing evaluation cost of CUDA kernels-Reference-Cited by-同舟云学术

Modeling and analyzing evaluation cost of CUDA kernels

Published:2021-01-04 Issue:POPL Volume:5 Page:1-31
ISSN:2475-1421
Container-title:Proceedings of the ACM on Programming Languages
language:en
Short-container-title:Proc. ACM Program. Lang.

Author:

Muller Stefan K.¹,Hoffmann Jan²

Affiliation:

1. Illinois Institute of Technology, USA

2. Carnegie Mellon University, USA

Abstract

General-purpose programming on GPUs (GPGPU) is becoming increasingly in vogue as applications such as machine learning and scientific computing demand high throughput in vector-parallel applications. NVIDIA's CUDA toolkit seeks to make GPGPU programming accessible by allowing programmers to write GPU functions, called kernels, in a small extension of C/C++. However, due to CUDA's complex execution model, the performance characteristics of CUDA kernels are difficult to predict, especially for novice programmers. This paper introduces a novel quantitative program logic for CUDA kernels, which allows programmers to reason about both functional correctness and resource usage of CUDA kernels, paying particular attention to a set of common but CUDA-specific performance bottlenecks. The logic is proved sound with respect to a novel operational cost semantics for CUDA kernels. The semantics, logic and soundness proofs are formalized in Coq. An inference algorithm based on LP solving automatically synthesizes symbolic resource bounds by generating derivations in the logic. This algorithm is the basis of RaCuda, an end-to-end resource-analysis tool for kernels, which has been implemented using an existing resource-analysis tool for imperative programs. An experimental evaluation on a suite of CUDA benchmarks shows that the analysis is effective in aiding the detection of performance bugs in CUDA kernels.

Funder

Defense Advanced Research Projects Agency

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3434306

Reference14 articles.

1. Cost Analysis of Concurrent OO Programs

2. Linear Dependent Types and Relative Completeness

3. Scalable SMT-based verification of GPU kernel functions

4. GKLEE

5. Latency-Hiding Work Stealing

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms;ACM Transactions on Programming Languages and Systems;2024-05-22

2. TrackFM: Far-out Compiler Support for a Far Memory World;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1;2024-04-17

3. Modeling and Analyzing Evaluation Cost of CUDA Kernels;ACM Transactions on Parallel Computing;2024-03-12

4. Automatic Static Analysis-Guided Optimization of CUDA Kernels;Proceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores;2024-03-03

5. Systematic Literature Review on Machine Learning and its Impact on APIs Deployment;Computación y Sistemas;2023-12-27