Energy-efficient mechanisms for managing thread context in throughput processors-Reference-Cited by-同舟云学术

Energy-efficient mechanisms for managing thread context in throughput processors

Published:2011-06-22 Issue:3 Volume:39 Page:235-246
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Gebhart Mark¹,Johnson Daniel R.²,Tarjan David³,Keckler Stephen W.⁴,Dally William J.⁵,Lindholm Erik³,Skadron Kevin⁶

Affiliation:

1. The University of Texas at Austin, Austin, TX, USA

2. University of Illinois at Urbana-Champaign, Urbana, IL, USA

3. NVIDIA, Santa Clara, CA, USA

4. NVIDIA / The University of Texas at Austin, Santa Clara, CA, USA

5. NVIDIA / Stanford University, Santa Clara, CA, USA

6. University of Virginia, Charlottesville, VA, USA

Abstract

Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs. First, we examine register file caching to replace accesses to the large main register file with accesses to a smaller structure containing the immediate register working set of active threads. Second, we investigate a two-level thread scheduler that maintains a small set of active threads to hide ALU and local memory access latency and a larger set of pending threads to hide main memory latency. Combined with register file caching, a two-level thread scheduler provides a further reduction in energy by limiting the allocation of temporary register cache resources to only the currently active subset of threads. We show that on average, across a variety of real world graphics and compute workloads, a 6-entry per-thread register file cache reduces the number of reads and writes to the main register file by 50% and 59% respectively. We further show that the active thread count can be reduced by a factor of 4 with minimal impact on performance, resulting in a 36% reduction of register file energy.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2024723.2000093

Reference36 articles.

1. APRIL

2. The Tera computer system

3. AMD. R600-Family Instruction Set Architecture. http://developer.amd.com/ gpu_assets/ R600_Instruction_Set_Architecture.pdf January 2009. AMD. R600-Family Instruction Set Architecture. http://developer.amd.com/ gpu_assets/ R600_Instruction_Set_Architecture.pdf January 2009.

4. AMD. ATI Stream Computing OpenCL Programming Guide. http://developer.amd.com/gpu/ ATIStreamSDK/assets/ ATI_Stream_SDK_OpenCL_Programming_Guide.pdf August 2010. AMD. ATI Stream Computing OpenCL Programming Guide. http://developer.amd.com/gpu/ ATIStreamSDK/assets/ ATI_Stream_SDK_OpenCL_Programming_Guide.pdf August 2010.

5. AMD. HD 6900 Series Instruction Set Architecture. http://developer.amd.com/gpu/ amdappsdk/assets/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf February 2011. AMD. HD 6900 Series Instruction Set Architecture. http://developer.amd.com/gpu/ amdappsdk/assets/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf February 2011.

Cited by 112 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An efficient sequential consistency implementation with dynamic race detection for GPUs;Journal of Parallel and Distributed Computing;2024-05

2. Exploring Hardware Fault Impacts on Different Real Number Representations of the Structural Resilience of TCUs in GPUs;Electronics;2024-01-31

3. LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs;IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences;2023-08-01

4. WSMP: a warp scheduling strategy based on MFQ and PPF;The Journal of Supercomputing;2023-03-10

5. Lightweight Register File Caching in Collector Units for GPUs;Proceedings of the 15th Workshop on General Purpose Processing Using GPU;2023-02-25