COX : Exposing CUDA Warp-level Functions to CPUs-Reference-Cited by-同舟云学术

COX : Exposing CUDA Warp-level Functions to CPUs

Published:2022-09-16 Issue:4 Volume:19 Page:1-25
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Han Ruobing¹^ORCID,Lee Jaewon¹^ORCID,Sim Jaewoong²^ORCID,Kim Hyesoon¹^ORCID

Affiliation:

1. Georgia Institute of Technology, North Avenue Atlanta, GA , USA

2. Seoul National University, Gwanak-gu, Seoul, South Korea

Abstract

As CUDA becomes the de facto programming language among data parallel applications such as high-performance computing or machine learning applications, running CUDA on other platforms becomes a compelling option. Although several efforts have attempted to support CUDA on devices other than NVIDIA GPUs, due to extra steps in the translation, the support is always a few years behind CUDA’s latest features. In particular, the new CUDA programming model exposes the warp concept in the programming language, which greatly changes the way the CUDA code should be mapped to CPU programs. In this article, hierarchical collapsing that correctly supports CUDA warp-level functions on CPUs is proposed. To verify hierarchical collapsing , we build a framework, COX , that supports executing CUDA source code on the CPU backend. With hierarchical collapsing , 90% of kernels in CUDA SDK samples can be executed on CPUs, much higher than previous works (68%). We also evaluate the performance with benchmarks for real applications and show that hierarchical collapsing can generate CPU programs with comparable or even higher performance than previous projects in general.

Funder

Booz Allen Hamilton Inc.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3554736

Reference58 articles.

1. Compiling and Executing CUDA Programs in Emulation Mode. Retrieved from https://developer.nvidia.com/cuda-toolkit.

2. Tomo3D 2.0 – Exploitation of Advanced Vector eXtensions (AVX) for 3D reconstruction

3. SYCL beyond OpenCL

4. AMD. 2021. HIP. Retrieved from https://github.com/ROCm-Developer-Tools/HIP.

5. AMD. 2021. HIP-CPU. Retrieved from https://github.com/ROCm-Developer-Tools/HIP-CPU.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CuPBoP: Making CUDA a Portable Language;ACM Transactions on Design Automation of Electronic Systems;2024-06-21

2. Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches;2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2024-05-27

3. OpenMP Kernel Language Extensions for Performance Portable GPU Codes;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12

4. High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs;Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2023-02-21