Affiliation:
1. Chalmers University of Technology, Göteborg, Sweden
Abstract
Reducing the precision of floating-point values can improve performance and/or reduce energy expenditure in computer graphics, among other, applications. However, reducing the precision level of floating-point values in a controlled fashion needs support both at the compiler and at the microarchitecture level. At the compiler level, a method is needed to automate the reduction of precision of each floating-point value. At the microarchitecture level, a lower precision of each floating-point register can allow more floating-point values to be packed into a register file. This, however, calls for new register file organizations.
This article proposes an automated precision-selection method and a novel GPU register file organization that can store floating-point register values at arbitrary precisions densely. The automated precision-selection method uses a data-driven approach for setting the precision level of floating-point values, given a quality threshold and a representative set of input data. By allowing a small, but acceptable, degradation in output quality, our method can remove a significant amount of the bits needed to represent floating-point values in the investigated kernels (between 28% and 60%). Our proposed register file organization exploits these lower-precision floating-point values by packing several of them into the same physical register. This reduces the register pressure per thread by up to 48%, and by 27% on average, for a negligible output-quality degradation. This can enable GPUs to keep up to twice as many threads in flight simultaneously.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SeTHet - Sending Tuned numbers over DMA onto Heterogeneous clusters: an automated precision tuning story;Proceedings of the 21st ACM International Conference on Computing Frontiers;2024-05-07
2. The Impact of Profiling Versus Static Analysis in Precision Tuning;IEEE Access;2024
3. Constrained Precision Tuning;2022 8th International Conference on Control, Decision and Information Technologies (CoDIT);2022-05-17
4. L
2
C: Combining Lossy and Lossless Compression on Memory and I/O;ACM Transactions on Embedded Computing Systems;2022-01-14
5. Tools for Reduced Precision Computation;ACM Computing Surveys;2021-03-31