LTRF-Reference-Cited by-同舟云学术

LTRF

Published:2018-11-30 Issue:2 Volume:53 Page:489-502
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Sadrosadati Mohammad¹,Mirhosseini Amirhossein²,Ehsani Seyed Borna³,Sarbazi-Azad Hamid⁴,Drumond Mario⁵,Falsafi Babak⁵,Ausavarungnirun Rachata⁶,Mutlu Onur⁷

Affiliation:

1. Sharif University of Technology&ETH Zurich, Tehran, Iran

2. University of Michigan, Ann Arbor, MI, USA

3. Sharif University of Technology, Tehran, Iran

4. Sharif University of Technology&IPM, Tehran, Iran

5. EPFL, Lausanne, Switzerland

6. Carnegie Mellon University, Pittsburgh, PA, USA

7. ETH Zurich&Carnegie Mellon University, Zurich, Switzerland

Abstract

Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file, to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this paper, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure while keeping power consumption low. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp's aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register working-set from the main register file to the register file cache under software control, at the beginning of each interval, and overlap the prefetch latency with the execution of other warps. Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8X larger capacity and improving overall GPU performance by 31% while reducing register file power consumption by 46%.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3296957.3173211

Reference84 articles.

1. "LTRF Register-Interval-Algorithm " https://github.com/Carnegie Mellon University-SAFARI/Register-Interval. "LTRF Register-Interval-Algorithm " https://github.com/Carnegie Mellon University-SAFARI/Register-Interval.

2. Warped register file: A power efficient register file for GPGPUs

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Conflict-aware compiler for hierarchical register file on GPUs;Journal of Systems Architecture;2024-04

2. Criticality-aware priority to accelerate GPU memory access;The Journal of Supercomputing;2022-07-06

3. Virtual-Cache: A cache-line borrowing technique for efficient GPU cache architectures;Microprocessors and Microsystems;2021-09

4. Efficient Nearest-Neighbor Data Sharing in GPUs;ACM Transactions on Architecture and Code Optimization;2021-01-21