Affiliation:
1. Massachusetts Institute of Technology, Cambridge, MA, USA
2. University of California Berkeley, Berkeley, CA, USA
Abstract
We define “reproducibility” as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a “reproducible accumulator” data structure (the “binned number”) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing
n
words with a 6-word reproducible accumulator requires approximately 9
n
floating point operations (arithmetic, comparison, and absolute value) and approximately 3
n
bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 2
29
times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.
Funder
Darpa XDATA
HP
Nokia
DOE Computational Science Graduate Fellowship
Mathworks
DARPA
ASPIRE Lab
LGE
Samsung
Cray
NSF
Intel
DOE
Intel ITSC
Google
Huawei
NVIDIA
Oracle
Aramco
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Reference31 articles.
1. Intel. 2018. Developer Reference for Intel® Math Kernel Library 2018 - C | Intel® Software. Retrieved from https://software.intel.com/en-us/download/developer-reference-for-intel-math-kernel-library-2018-c. Intel. 2018. Developer Reference for Intel® Math Kernel Library 2018 - C | Intel® Software. Retrieved from https://software.intel.com/en-us/download/developer-reference-for-intel-math-kernel-library-2018-c.
2. NVIDIA. 2018. NVIDIA® cuBLAS. Retrieved from http://docs.nvidia.com/cuda/cublas/index.html. NVIDIA. 2018. NVIDIA® cuBLAS. Retrieved from http://docs.nvidia.com/cuda/cublas/index.html.
3. Intel. 2019. bfloat16 - HardwareNumerics Definition. Retrieved from https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-numerics-definition-white-paper.pdf. Intel. 2019. bfloat16 - HardwareNumerics Definition. Retrieved from https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-numerics-definition-white-paper.pdf.
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Integration of Posit Arithmetic in RISC-V Targeting Low-Power Computations;2024 IEEE 24th International Conference on Nanotechnology (NANO);2024-07-08
2. Useful applications of correctly-rounded operators of the form ab + cd + e;2024 IEEE 31st Symposium on Computer Arithmetic (ARITH);2024-06-10
3. Asynchronous Multi-Level Checkpointing: An Enabler of Reproducibility using Checkpoint History Analytics;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12
4. ddRingAllreduce: a high-precision RingAllreduce algorithm;CCF Transactions on High Performance Computing;2023-07-05
5. Improving accuracy of summation using parallel vectorized Kahan's and Gill‐Møller algorithms;Concurrency and Computation: Practice and Experience;2023-05-10