Affiliation:
1. Universitat Politècnica de Catalunya
2. Intel Barcelona Research Center, Intel Labs Barcelona
Abstract
Smartphones represent one of the fastest growing markets, providing significant hardware/software improvements every few months. However, supporting these capabilities reduces the operating time per battery charge. The CPU/GPU component is only left with a shrinking fraction of the power budget, since most of the energy is consumed by the screen and the antenna.
In this paper, we focus on improving the energy efficiency of the GPU since graphical applications consist an important part of the existing market. Moreover, the trend towards better screens will inevitably lead to a higher demand for improved graphics rendering. We show that the main bottleneck for these applications is the texture cache and that traditional techniques for hiding memory latency (prefetching, multithreading) do not work well or come at a high energy cost.
We thus propose the migration of GPU designs towards the decoupled access-execute concept. Furthermore, we significantly reduce bandwidth usage in the decoupled architecture by exploiting inter-core data sharing. Using commercial Android applications, we show that the end design can achieve 93% of the performance of a heavily multithreaded GPU while providing energy savings of 34%.
Publisher
Association for Computing Machinery (ACM)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs;2023 32nd International Conference on Parallel Architectures and Compilation Techniques (PACT);2023-10-21
2. Optimization strategies for GPUs: an overview of architectural approaches;International Journal of Parallel, Emergent and Distributed Systems;2023-02-05
3. Stream data prefetcher for the GPU memory interface;The Journal of Supercomputing;2018-01-27