Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU-Reference-Cited by-同舟云学术

Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU

Published:2021-01-01 Issue:1 Volume:1740 Page:012056
ISSN:1742-6588
Container-title:Journal of Physics: Conference Series
language:
Short-container-title:J. Phys.: Conf. Ser.

Author:

Khalilov Mikhail,Timoveev Alexey

Abstract

Abstract Graphics processors are widely utilized in modern supercomputers as accelerators. Ability to perform efficient parallelization and low-level allow scientists to greatly boost performance of their codes. Modern Nvidia GPUs feature low-level approaches, such as CUDA, along with high-level approaches: OpenACC and OpenMP. While the low-level approach aims to explore all possible abilities of SIMT GPU architecture by writing low-level C/C++ code, it takes significant effort from programmer. OpenACC and OpenMP programming models are opposite to CUDA. Using these models the programmer only have to identify the blocks of code to be parallelized using pragmas. We compare the performance of CUDA, OpenMP and OpenACC on state-of-the-art Nvidia Tesla V100 GPU in various typical scenarios that arise in scientific programming, such as matrix multiplication, regular memory access patterns and evaluate performance of physical simulation codes implemented using these programming models. Moreover, we study the performance matrix multiplication implemented in vendor-optimized BLAS libraries for Nvidia Tesla V100 GPU and modern Intel Xeon processor.

Publisher

IOP Publishing

Subject

General Physics and Astronomy

Link

https://iopscience.iop.org/article/10.1088/1742-6596/1740/1/012056/pdf

Reference17 articles.

1. The OpenCL(TM) Specification,2019

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Portable Tool to Compare Performance Profiles from GPU Offloading Programming Models;Proceedings of the 21st ACM International Conference on Computing Frontiers;2024-05-07

2. Multi-GPU UNRES for scalable coarse-grained simulations of very large protein systems;Computer Physics Communications;2024-05

3. Identification of Education Activity Based on Datalake Captured from Internal Sensor Data of Supercomputer;2024 IEEE 7th Eurasian Conference on Educational Innovation (ECEI);2024-01-26

4. MIMD Programs Execution Support on SIMD Machines: A Holistic Survey;IEEE Access;2024

5. Energy Efficiency of Multithreaded WZ Factorization with the Use of OpenMP and OpenACC on CPU and GPU;Lecture Notes in Computer Science;2024