Affiliation:
1. Illinois Institute of Technology, USA
2. Carnegie Mellon University, USA
Abstract
General-purpose programming on GPUs (GPGPU) is becoming increasingly in vogue as applications such as machine learning and scientific computing demand high throughput in vector-parallel applications. NVIDIA's CUDA toolkit seeks to make GPGPU programming accessible by allowing programmers to write GPU functions, called kernels, in a small extension of C/C++. However, due to CUDA's complex execution model, the performance characteristics of CUDA kernels are difficult to predict, especially for novice programmers.
This paper introduces a novel quantitative program logic for CUDA kernels, which allows programmers to reason about both functional correctness and resource usage of CUDA kernels, paying particular attention to a set of common but CUDA-specific performance bottlenecks. The logic is proved sound with respect to a novel operational cost semantics for CUDA kernels. The semantics, logic and soundness proofs are formalized in Coq. An inference algorithm based on LP solving automatically synthesizes symbolic resource bounds by generating derivations in the logic. This algorithm is the basis of RaCuda, an end-to-end resource-analysis tool for kernels, which has been implemented using an existing resource-analysis tool for imperative programs. An experimental evaluation on a suite of CUDA benchmarks shows that the analysis is effective in aiding the detection of performance bugs in CUDA kernels.
Funder
Defense Advanced Research Projects Agency
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Safety, Risk, Reliability and Quality,Software
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. (De/Re)-Composition of Data-Parallel Computations via Multi-Dimensional Homomorphisms;ACM Transactions on Programming Languages and Systems;2024-05-22
2. TrackFM: Far-out Compiler Support for a Far Memory World;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1;2024-04-17
3. Modeling and Analyzing Evaluation Cost of CUDA Kernels;ACM Transactions on Parallel Computing;2024-03-12
4. Automatic Static Analysis-Guided Optimization of CUDA Kernels;Proceedings of the 15th International Workshop on Programming Models and Applications for Multicores and Manycores;2024-03-03
5. Systematic Literature Review on Machine Learning and its Impact on APIs Deployment;Computación y Sistemas;2023-12-27