Fine‐Grained Memory Profiling of GPGPU Kernels-Reference-Cited by-同舟云学术

Fine‐Grained Memory Profiling of GPGPU Kernels

Published:2022-10 Issue:7 Volume:41 Page:227-235
ISSN:0167-7055
Container-title:Computer Graphics Forum
language:en
Short-container-title:Computer Graphics Forum

Author:

Buelow Max von¹^ORCID,Guthe Stefan¹^ORCID,Fellner Dieter W.¹²

Affiliation:

1. Technical University of Darmstadt Germany

2. Fraunhofer IGD, Germany & Graz University of Technology Institute of Computer Graphics and Knowledge Visualization Austria

Abstract

AbstractMemory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine‐grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state‐of‐the‐art memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.

Funder

Deutsche Forschungsgemeinschaft

Publisher

Wiley

Subject

Computer Graphics and Computer-Aided Design

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.14671

Reference27 articles.

1. Arafa Yehia Badawy Abdel‐Hameed Chennupati Gopinath et al. “Fast accurate and scalable memory modeling of GPGPUs using reuse profiles”.Proceedings of the 34th ACM International Conference on Supercomputing. ICS '20. ACM June2020. doi:10.1145/3392717.33927611 2.

2. GPUs Cache Performance Estimation using Reuse Distance Analysis

3. An analytical cache model

4. Aila Timo Karras Tero andLaine Samuli. “On quality metrics of bounding volume hierarchies”.Proceedings of the 5th High-Performance Graphics Conference on - HPG ‘13. HPG ‘13. ACM Press 2013. doi:10.1145/2492045.24920562 8.

5. Aila TimoandLaine Samuli. “Understanding the efficiency of ray traversal on GPUs”.Proceedings of the 1st ACM conference on High Performance Graphics - HPG ‘09. HPG ‘09. ACM Press 2009. doi:10.1145/1572769.15727922 6 8.