Energy-efficient mechanisms for managing thread context in throughput processors

Author:

Gebhart Mark1,Johnson Daniel R.2,Tarjan David3,Keckler Stephen W.4,Dally William J.5,Lindholm Erik3,Skadron Kevin6

Affiliation:

1. The University of Texas at Austin, Austin, TX, USA

2. University of Illinois at Urbana-Champaign, Urbana, IL, USA

3. NVIDIA, Santa Clara, CA, USA

4. NVIDIA / The University of Texas at Austin, Santa Clara, CA, USA

5. NVIDIA / Stanford University, Santa Clara, CA, USA

6. University of Virginia, Charlottesville, VA, USA

Abstract

Modern graphics processing units (GPUs) use a large number of hardware threads to hide both function unit and memory access latency. Extreme multithreading requires a complicated thread scheduler as well as a large register file, which is expensive to access both in terms of energy and latency. We present two complementary techniques for reducing energy on massively-threaded processors such as GPUs. First, we examine register file caching to replace accesses to the large main register file with accesses to a smaller structure containing the immediate register working set of active threads. Second, we investigate a two-level thread scheduler that maintains a small set of active threads to hide ALU and local memory access latency and a larger set of pending threads to hide main memory latency. Combined with register file caching, a two-level thread scheduler provides a further reduction in energy by limiting the allocation of temporary register cache resources to only the currently active subset of threads. We show that on average, across a variety of real world graphics and compute workloads, a 6-entry per-thread register file cache reduces the number of reads and writes to the main register file by 50% and 59% respectively. We further show that the active thread count can be reduced by a factor of 4 with minimal impact on performance, resulting in a 36% reduction of register file energy.

Publisher

Association for Computing Machinery (ACM)

Reference36 articles.

1. APRIL

2. The Tera computer system

3. AMD. R600-Family Instruction Set Architecture. http://developer.amd.com/ gpu_assets/ R600_Instruction_Set_Architecture.pdf January 2009. AMD. R600-Family Instruction Set Architecture. http://developer.amd.com/ gpu_assets/ R600_Instruction_Set_Architecture.pdf January 2009.

4. AMD. ATI Stream Computing OpenCL Programming Guide. http://developer.amd.com/gpu/ ATIStreamSDK/assets/ ATI_Stream_SDK_OpenCL_Programming_Guide.pdf August 2010. AMD. ATI Stream Computing OpenCL Programming Guide. http://developer.amd.com/gpu/ ATIStreamSDK/assets/ ATI_Stream_SDK_OpenCL_Programming_Guide.pdf August 2010.

5. AMD. HD 6900 Series Instruction Set Architecture. http://developer.amd.com/gpu/ amdappsdk/assets/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf February 2011. AMD. HD 6900 Series Instruction Set Architecture. http://developer.amd.com/gpu/ amdappsdk/assets/AMD_HD_6900_Series_Instruction_Set_Architecture.pdf February 2011.

Cited by 109 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. LFWS: Long-Operation First Warp Scheduling Algorithm to Effectively Hide the Latency for GPUs;IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences;2023-08-01

2. WSMP: a warp scheduling strategy based on MFQ and PPF;The Journal of Supercomputing;2023-03-10

3. Lightweight Register File Caching in Collector Units for GPUs;Proceedings of the 15th Workshop on General Purpose Processing Using GPU;2023-02-25

4. Know Your Enemy To Save Cloud Energy: Energy-Performance Characterization of Machine Learning Serving;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02

5. Mitigating GPU Core Partitioning Performance Effects;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3