Algorithms for Efficient Reproducible Floating Point Summation-Reference-Cited by-同舟云学术

Algorithms for Efficient Reproducible Floating Point Summation

Published:2020-09-30 Issue:3 Volume:46 Page:1-49
ISSN:0098-3500
Container-title:ACM Transactions on Mathematical Software
language:en
Short-container-title:ACM Trans. Math. Softw.

Author:

Ahrens Peter¹^ORCID,Demmel James²,Nguyen Hong Diep²

Affiliation:

1. Massachusetts Institute of Technology, Cambridge, MA, USA

2. University of California Berkeley, Berkeley, CA, USA

Abstract

We define “reproducibility” as getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should not affect the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociative floating point addition, makes reproducibility challenging even for summation, or operations like the BLAS. We describe a “reproducible accumulator” data structure (the “binned number”) and associated algorithms to reproducibly sum binary floating point numbers, independent of summation order. We use a subset of the IEEE Floating Point Standard 754-2008 and bitwise operations on the standard representations in memory. Our approach requires only one read-only pass over the data, and one reduction in parallel, using a 6-word reproducible accumulator (more words can be used for higher accuracy), enabling standard tiling optimization techniques. Summing n words with a 6-word reproducible accumulator requires approximately 9 n floating point operations (arithmetic, comparison, and absolute value) and approximately 3 n bitwise operations. The final error bound with a 6-word reproducible accumulator and our default settings can be up to 2 29 times smaller than the error bound for conventional (recursive) summation on ill-conditioned double-precision inputs.

Funder

Darpa XDATA

Nokia

DOE Computational Science Graduate Fellowship

Mathworks

DARPA

ASPIRE Lab

LGE

Samsung

Cray

NSF

Intel

DOE

Intel ITSC

Google

Huawei

NVIDIA

Oracle

Aramco

Publisher

Association for Computing Machinery (ACM)

Subject

Applied Mathematics,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3389360

Reference31 articles.

1. Intel. 2018. Developer Reference for Intel® Math Kernel Library 2018 - C | Intel® Software. Retrieved from https://software.intel.com/en-us/download/developer-reference-for-intel-math-kernel-library-2018-c. Intel. 2018. Developer Reference for Intel® Math Kernel Library 2018 - C | Intel® Software. Retrieved from https://software.intel.com/en-us/download/developer-reference-for-intel-math-kernel-library-2018-c.

2. NVIDIA. 2018. NVIDIA® cuBLAS. Retrieved from http://docs.nvidia.com/cuda/cublas/index.html. NVIDIA. 2018. NVIDIA® cuBLAS. Retrieved from http://docs.nvidia.com/cuda/cublas/index.html.

3. Intel. 2019. bfloat16 - HardwareNumerics Definition. Retrieved from https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-numerics-definition-white-paper.pdf. Intel. 2019. bfloat16 - HardwareNumerics Definition. Retrieved from https://software.intel.com/sites/default/files/managed/40/8b/bf16-hardware-numerics-definition-white-paper.pdf.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integration of Posit Arithmetic in RISC-V Targeting Low-Power Computations;2024 IEEE 24th International Conference on Nanotechnology (NANO);2024-07-08

2. Useful applications of correctly-rounded operators of the form ab + cd + e;2024 IEEE 31st Symposium on Computer Arithmetic (ARITH);2024-06-10

3. Asynchronous Multi-Level Checkpointing: An Enabler of Reproducibility using Checkpoint History Analytics;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12

4. ddRingAllreduce: a high-precision RingAllreduce algorithm;CCF Transactions on High Performance Computing;2023-07-05

5. Improving accuracy of summation using parallel vectorized Kahan's and Gill‐Møller algorithms;Concurrency and Computation: Practice and Experience;2023-05-10