Access Pattern-Aware Cache Management for Improving Data Utilization in GPU-Reference-Cited by-同舟云学术

Access Pattern-Aware Cache Management for Improving Data Utilization in GPU

Published:2017-09-14 Issue:2 Volume:45 Page:307-319
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Koo Gunjae¹,Oh Yunho²,Ro Won Woo²,Annavaram Murali¹

Affiliation:

1. University of Southern California

2. Yonsei University

Abstract

Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space. In this paper we discover that individual load instructions in a warp exhibit four different types of data locality behavior: (1) data brought by a warp load instruction is used only once, which is classified as streaming data (2) data brought by a warp load is reused multiple times within the same warp, called intra-warp locality (3) data brought by a warp is reused multiple times but across different warps, called inter-warp locality (4) and some data exhibit both a mix of intra- and inter-warp locality. Furthermore, each load instruction exhibits consistently the same locality type across all warps within a GPU kernel. Based on this discovery we argue that cache management must be done using per-load locality type information, rather than applying warp-wide cache management policies. We propose Access Pattern-aware Cache Management (APCM), which dynamically detects the locality type of each load instruction by monitoring the accesses from one exemplary warp. APCM then uses the detected locality type to selectively apply cache bypassing and cache pinning of data based on load locality characterization. Using an extensive set of simulations we show that APCM improves performance of GPUs by 34% for cache sensitive applications while saving 27% of energy consumption over baseline GPU.

Funder

National Science Foundation

National Research Foundation of Korea

Defense Advanced Research Projects Agency

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3140659.3080239

Reference43 articles.

1. FreePDK process design kit. http://www.eda.ncsu.edu/wiki/FreePDK FreePDK process design kit. http://www.eda.ncsu.edu/wiki/FreePDK

2. GPGPU-sim manual. http://gpgpu-sim.org/manual GPGPU-sim manual. http://gpgpu-sim.org/manual

3. AMD. AMD Graphics Cores Next (GCN) Architecture. AMD. AMD Graphics Cores Next (GCN) Architecture.

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. OSM: Off-Chip Shared Memory for GPUs;IEEE Transactions on Parallel and Distributed Systems;2022-12-01

2. Aggressive GPU cache bypassing with monolithic 3D-based NoC;The Journal of Supercomputing;2022-10-21

3. A Case for Fine-grain Coherence Specialization in Heterogeneous Systems;ACM Transactions on Architecture and Code Optimization;2022-08-22

4. Criticality-aware priority to accelerate GPU memory access;The Journal of Supercomputing;2022-07-06

5. On the Effects of Transaction Data Access Patterns on Performance in Lock-based Concurrency Control;IEEE Transactions on Computers;2022