Access Pattern-Aware Cache Management for Improving Data Utilization in GPU

Author:

Koo Gunjae1,Oh Yunho2,Ro Won Woo2,Annavaram Murali1

Affiliation:

1. University of Southern California

2. Yonsei University

Abstract

Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space. In this paper we discover that individual load instructions in a warp exhibit four different types of data locality behavior: (1) data brought by a warp load instruction is used only once, which is classified as streaming data (2) data brought by a warp load is reused multiple times within the same warp, called intra-warp locality (3) data brought by a warp is reused multiple times but across different warps, called inter-warp locality (4) and some data exhibit both a mix of intra- and inter-warp locality. Furthermore, each load instruction exhibits consistently the same locality type across all warps within a GPU kernel. Based on this discovery we argue that cache management must be done using per-load locality type information, rather than applying warp-wide cache management policies. We propose Access Pattern-aware Cache Management (APCM), which dynamically detects the locality type of each load instruction by monitoring the accesses from one exemplary warp. APCM then uses the detected locality type to selectively apply cache bypassing and cache pinning of data based on load locality characterization. Using an extensive set of simulations we show that APCM improves performance of GPUs by 34% for cache sensitive applications while saving 27% of energy consumption over baseline GPU.

Funder

National Science Foundation

National Research Foundation of Korea

Defense Advanced Research Projects Agency

Publisher

Association for Computing Machinery (ACM)

Reference43 articles.

1. FreePDK process design kit. http://www.eda.ncsu.edu/wiki/FreePDK FreePDK process design kit. http://www.eda.ncsu.edu/wiki/FreePDK

2. GPGPU-sim manual. http://gpgpu-sim.org/manual GPGPU-sim manual. http://gpgpu-sim.org/manual

3. AMD. AMD Graphics Cores Next (GCN) Architecture. AMD. AMD Graphics Cores Next (GCN) Architecture.

Cited by 11 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. OSM: Off-Chip Shared Memory for GPUs;IEEE Transactions on Parallel and Distributed Systems;2022-12-01

2. Aggressive GPU cache bypassing with monolithic 3D-based NoC;The Journal of Supercomputing;2022-10-21

3. A Case for Fine-grain Coherence Specialization in Heterogeneous Systems;ACM Transactions on Architecture and Code Optimization;2022-08-22

4. Criticality-aware priority to accelerate GPU memory access;The Journal of Supercomputing;2022-07-06

5. On the Effects of Transaction Data Access Patterns on Performance in Lock-based Concurrency Control;IEEE Transactions on Computers;2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3