Affiliation:
1. Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN
Abstract
Singular Value QR (SVQR) can orthonormalize a set of dense vectors with the minimum communication (one global reduction between the parallel processing units, and BLAS-3 to perform most of its local computation). As a result, compared to other orthogonalization schemes, SVQR obtains superior performance on many of the current computers, where the communication has become significantly more expensive compared to the arithmetic operations. In this article, we study the stability and performance of various SVQR implementations on multicore CPUs with a GPU. Our focus is on the dense triangular solve, which performs half of the total floating-point operations of SVQR. As a part of this study, we examine an adaptive mixed-precision variant of SVQR, which decides if a lower-precision arithmetic can be used for the triangular solution at runtime without increasing the order of its orthogonality error (though its backward error is significantly greater). If the greater backward error can be tolerated, then our performance results with an NVIDIA Kepler GPU show that the mixed-precision SVQR can obtain a speedup of up to 1.36 over the standard SVQR.
Funder
Collaborative Research: SDCI HPC Improvement
Russian Scientific Fund
“Matrix Algebra for GPU and Multicore Architectures (MAGMA) for Large Petascale Systems.”
Community Based Dense Linear Algebra Software for Extreme Scale Computational Science, DOE
“Extreme-scale Algorithms & Solver Resilience (EASIR),”
[NSF] SDCI - National Science Foundation Award
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献