Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems-Reference-Cited by-同舟云学术

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems

Published:2020-11 Issue:2243 Volume:476 Page:20200110
ISSN:1364-5021
Container-title:Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
language:en
Short-container-title:Proc. R. Soc. A.

Author:

Haidar Azzam¹,Bayraktar Harun¹,Tomov Stanimire²,Dongarra Jack²³⁴^ORCID,Higham Nicholas J.⁴^ORCID

Affiliation:

1. NVIDIA, Santa Clara, CA, USA

2. Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA

3. Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA

4. Department of Mathematics, University of Manchester, Manchester M13 9PL, UK

Abstract

Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations Ax = b without sacrificing numerical stability. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We also show how to efficiently handle systems with multiple right-hand sides. On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a 4 × − 5 × performance increase and 5× better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability.

Funder

Engineering and Physical Sciences Research Council

Publisher

The Royal Society

Subject

General Physics and Astronomy,General Engineering,General Mathematics

Link

https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.2020.0110

Reference42 articles.

1. NVIDIA. cuSolver library. https://docs.nvidia.com/cuda/cusolver/ Nov 2019.

2. MAGMA version 2.5.0. http://icl.cs.utk.edu/magma/software/ January 2019. Quick reference https://www.icl.utk.edu/files/print/2019/magma-sc19.pdf.

3. Towards dense linear algebra for hybrid GPU accelerated manycore systems

4. Intel. Math kernel library. https://software.intel.com/en-us/en-us/intel-mkl/.

5. LAPACK Users' Guide

Cited by 30 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Reducing Data Motion and Energy Consumption of Geospatial Modeling Applications Using Automated Precision Conversion;2023 IEEE International Conference on Cluster Computing (CLUSTER);2023-10-31

2. Leveraging Mixed Precision in Exponential Time Integration Methods;2023 IEEE High Performance Extreme Computing Conference (HPEC);2023-09-25

3. Optimizing Communication in 2D Grid-Based MPI Applications at Exascale;Proceedings of the 30th European MPI Users' Group Meeting;2023-09-11

4. Acceleration of iterative refinement for singular value decomposition;Numerical Algorithms;2023-07-19

5. XHYPRE: a reliable parallel numerical algorithm library for solving large-scale sparse linear equations;CCF Transactions on High Performance Computing;2023-04-04