Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs-Reference-Cited by-同舟云学术

Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs

Published:2019-01-08 Issue:4 Volume:15 Page:1-23
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Crago Neal C.¹,Stephenson Mark¹,Keckler Stephen W.¹

Affiliation:

1. NVIDIA, Santa Clara, CA

Abstract

Modern computing workloads often have high memory intensity, requiring high bandwidth access to memory. The memory request patterns of these workloads vary and include regular strided accesses and indirect (pointer-based) accesses. Such applications require a large number of address generation instructions and a high degree of memory-level parallelism. This article proposes new memory instructions that exploit strided and indirect memory request patterns and improve efficiency in GPU architectures. The new instructions reduce address calculation instructions by offloading addressing to dedicated hardware, and reduce destructive memory request interference by grouping related requests together. Our results show that we can eliminate 33% of dynamic instructions across 16 GPU benchmarks. These improvements result in an overall runtime improvement of 26%, an energy reduction of 18%, and a reduction in energy-delay product of 32%.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3280851

Reference44 articles.

1. Effective hardware-based data prefetching for high-performance processors

2. Rodinia: A benchmark suite for heterogeneous computing

3. Performance Evaluation of the Cray X1 Distributed Shared-Memory Architecture

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. WASP: Exploiting GPU Pipeline Parallelism with Hardware-Accelerated Automatic Warp Specialization;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

2. Vector Operations for Accelerating Expensive Bayesian Computations – A Tutorial Guide;Bayesian Analysis;2021-01-01