An Early Performance Comparison of CUDA and OpenACC


Li Xuechao,Shih Po-Chou


This paper presents a performance comparison between CUDA and OpenACC. The performance analysis focuses on programming models and underlying compilers. In addition, we proposed a Performance Ratio of Data Sensitivity (PRoDS) metric to objectively compare traditional subjective performances: how sensitive OpenACC and CUDA implementations are to change in data size. The results show that in terms of kernel running time, the OpenACC performance is lower than the CUDA performance because PGI compiler needs to translate OpenACC kernels into object code while CUDA codes can be directly run. Besides, OpenACC programs are more sensitive to data changes than the equivalent CUDA programs with optimizations, but CUDA is more sensitive to data changes than OpenACC if there are no optimizations. Overall we found that OpenACC is a reliable programming model and a good alternative to CUDA for accelerator devices.


EDP Sciences


General Medicine

Reference20 articles.

1. Li X., Li C.H. and Xie Y.. 2011. “A Retrieval System of Vehicles Based on Recognition of License Plates”. Proceedings of 2011 International Conference on Machine Learning and Cybernetics (ICMLC), IEEE, Guilin, pp.1453–1459.

2. Hoshino T., Maruyama N., Matsuoka S. and Takaki R. 2013. “CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a memory-bound CFD Application”. 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 136–143.

3. Herdman J.A.Gaudin W.P. Mclntosh-Smith S. and Boulton M. 2012. “Accelerating Hydrocodes with OpenACC, OpenCL and CUDA”. 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC). pp. 465–471.

4. Christgau S., Spazier J., Schnor B., Hammitzsch M., Babeyko A. and Waechter J. 2014. “A comparison of CUDA and OpenACC: Accelerating the Tsunami Simulation EasyWave”. 27th International Conference on Architecture of Computing Systems (ARCS). pp. 1–5.

5. Che S., Sheaffer Jeremy, W., Michael B., Lukasz G. S., Liang W., and Kevin S. 2010. “A characterization of the rodinia benchmark suite with comparison to contemporary cmp workloads”, in Proceedings of the IEEE International Symposium on Workload Characterization (IISWC’10), pp. 1–11

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3