Boosting mobile GPU performance with a decoupled access/execute fragment processor-Reference-Cited by-同舟云学术

Boosting mobile GPU performance with a decoupled access/execute fragment processor

Published:2012-09-05 Issue:3 Volume:40 Page:84-93
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Arnau José-María¹,Parcerisa Joan-Manuel¹,Xekalakis Polychronis²

Affiliation:

1. Universitat Politècnica de Catalunya

2. Intel Barcelona Research Center, Intel Labs Barcelona

Abstract

Smartphones represent one of the fastest growing markets, providing significant hardware/software improvements every few months. However, supporting these capabilities reduces the operating time per battery charge. The CPU/GPU component is only left with a shrinking fraction of the power budget, since most of the energy is consumed by the screen and the antenna. In this paper, we focus on improving the energy efficiency of the GPU since graphical applications consist an important part of the existing market. Moreover, the trend towards better screens will inevitably lead to a higher demand for improved graphics rendering. We show that the main bottleneck for these applications is the texture cache and that traditional techniques for hiding memory latency (prefetching, multithreading) do not work well or come at a high energy cost. We thus propose the migration of GPU designs towards the decoupled access-execute concept. Furthermore, we significantly reduce bandwidth usage in the decoupled architecture by exploiting inter-core data sharing. Using commercial Android applications, we show that the end design can achieve 93% of the performance of a heavily multithreaded GPU while providing energy savings of 34%.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/2366231.2337169

Reference34 articles.

1. NoC-aware cache design for chip multiprocessors

2. Graphics for the masses

3. OUTRIDER

4. Stride directed prefetching in scalar processors

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs;2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT);2023-10-21

2. Optimization strategies for GPUs: an overview of architectural approaches;International Journal of Parallel, Emergent and Distributed Systems;2023-02-05

3. Stream data prefetcher for the GPU memory interface;The Journal of Supercomputing;2018-01-27