A case for core-assisted bottleneck acceleration in GPUs-Reference-Cited by-同舟云学术

A case for core-assisted bottleneck acceleration in GPUs

Published:2016-01-04 Issue:3S Volume:43 Page:41-53
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Vijaykumar Nandita¹,Pekhimenko Gennady¹,Jog Adwait²,Bhowmick Abhishek¹,Ausavarungnirun Rachata¹,Das Chita²,Kandemir Mahmut²,Mowry Todd C.¹,Mutlu Onur¹

Affiliation:

1. Carnegie Mellon University

2. Pennsylvania State University

Abstract

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent execution of thousands of threads. Unfortunately, different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores. For example, when a GPU is bottlenecked by the available off-chip memory bandwidth, its computational resources are often overwhelmingly idle, waiting for data from memory to arrive. This paper introduces the Core-Assisted Bottleneck Acceleration (CABA) framework that employs idle on-chip resources to alleviate different bottlenecks in GPU execution. CABA provides flexible mechanisms to automatically generate "assist warps" that execute on GPU cores to perform specific tasks that can improve GPU performance and efficiency. CABA enables the use of idle computational units and pipelines to alleviate the memory bandwidth bottleneck, e.g., by using assist warps to perform data compression to transfer less data from memory. Conversely, the same framework can be employed to handle cases where the GPU is bottlenecked by the available computational units, in which case the memory pipelines are idle and can be used by CABA to speed up computation, e.g., by performing memoization using assist warps. We provide a comprehensive design and evaluation of CABA to perform effective and flexible data compression in the GPU memory hierarchy to alleviate the memory bandwidth bottleneck. Our extensive evaluations show that CABA, when used to implement data compression, provides an average performance improvement of 41.7% (as high as 2.6X) across a variety of memory-bandwidth-sensitive GPGPU applications.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2872887.2750399

Reference84 articles.

1. Hardware Support for Prescient Instruction Prefetch

2. Memory Expansion Technology (MXT): Software support and performance

3. Warped register file: A power efficient register file for GPGPUs

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Selective Memory Compression for GPU Memory Oversubscription Management;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12