Efficient Cache Performance Modeling in GPUs Using Reuse Distance Analysis-Reference-Cited by-同舟云学术

Efficient Cache Performance Modeling in GPUs Using Reuse Distance Analysis

Published:2018-12-31 Issue:4 Volume:15 Page:1-24
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Kiani Mohsen¹,Rajabzadeh Amir¹^ORCID

Affiliation:

1. Razi University, Taghe-Bostan, Kermanshah, Iran

Abstract

Reuse distance analysis (RDA) is a popular method for calculating locality profiles and modeling cache performance. The present article proposes a framework to apply the RDA algorithm to obtain reuse distance profiles in graphics processing unit (GPU) kernels. To study the implications of hardware-related parameters in RDA, two RDA algorithms were employed, including a high-level cache-independent RDA algorithm, called HLRDA, and a detailed RDA algorithm, called DRDA. DRDA models the effects of reservation fails in cache blocks and miss status holding registers to provide accurate cache-related performance metrics. In this case, the reuse profiles are cache-specific. In a selection of GPU kernels, DRDA obtained the L1 miss-rate breakdowns with an average error of 3.86% and outperformed the state-of-the-art RDA in terms of accuracy. In terms of performance, DRDA is 246,000× slower than the real GPU executions and 11× faster than GPGPU-Sim. HLRDA ignores the cache-related parameters and its obtained reuse profiles are general, which can be used to calculate miss rates in all cache sizes. Moreover, the average error incurred by HLRDA was 16.9%.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3291051

Reference40 articles.

1. Calculating stack distances efficiently

2. Mosaic

3. Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis

4. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design and performance analysis of modern computational storage devices: A systematic review;Expert Systems with Applications;2024-09

2. Snake: A Variable-length Chain-based Prefetching for GPUs;56th Annual IEEE/ACM International Symposium on Microarchitecture;2023-10-28

3. GCoM;Proceedings of the 49th Annual International Symposium on Computer Architecture;2022-06-11

4. Hybrid, scalable, trace-driven performance modeling of GPGPUs;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2021-11-13

5. Analytical Modeling the Multi-Core Shared Cache Behavior With Considerations of Data-Sharing and Coherence;IEEE Access;2021