Affiliation:
1. University of Utah, USA
Abstract
Virtually all real-valued computations are carried out using floating-point data types and operations. The precision of these data types must be set with the goals of reducing the overall round-off error, but also emphasizing performance improvements. Often, a mixed-precision allocation achieves this optimum; unfortunately, there are no techniques available to compute such allocations and conservatively meet a given error target across all program inputs. In this work, we present a rigorous approach to precision allocation based on formal analysis via Symbolic Taylor Expansions, and error analysis based on interval functions. This approach is implemented in an automated tool called FPTuner that generates and solves a quadratically constrained quadratic program to obtain a precision-annotated version of the given expression. FPTuner automatically introduces all the requisite precision up and down casting operations. It also allows users to flexibly control precision allocation using constraints to cap the number of high precision operators as well as group operators to allocate the same precision to facilitate vectorization. We evaluate FPTuner by tuning several benchmarks and measuring the proportion of lower precision operators allocated as we increase the error threshold. We also measure the reduction in energy consumption resulting from executing mixed-precision tuned code on a real hardware platform. We observe significant energy savings in response to mixed-precision tuning, but also observe situations where unexpected compiler behaviors thwart intended optimizations.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Cited by
78 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SMT Theory Arbitrage: Approximating Unbounded Constraints using Bounded Theories;Proceedings of the ACM on Programming Languages;2024-06-20
2. MixPert: Optimizing Mixed-Precision Floating-Point Emulation on GPU Integer Tensor Cores;Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems;2024-06-20
3. Compile-Time Optimization of the Energy Consumption of Numerical Computations;Proceedings of the 21st ACM International Conference on Computing Frontiers: Workshops and Special Sessions;2024-05-07
4. SeTHet - Sending Tuned numbers over DMA onto Heterogeneous clusters: an automated precision tuning story;Proceedings of the 21st ACM International Conference on Computing Frontiers;2024-05-07
5. Interleaved Execution of Approximated CUDA Kernels in Iterative Applications;2024 32nd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP);2024-03-20