Affiliation:
1. Institute of Computer Engineering (ZITI), Heidelberg, Germany
Abstract
In this article, we present a CUDA library with a C API for solving block cyclic tridiagonal and banded systems on one GPU. The library can process block tridiagonal systems with block sizes from 1 × 1 (scalar) to 4 × 4 and banded systems with up to four sub- and superdiagonals. For the compute-intensive block size cases and cases with many right-hand sides, we write out an explicit factorization to memory; however, for the scalar case, the fastest approach is to only output the coarse system and recompute the factorization. Prominent features of the library are (scaled) partial pivoting for improved numeric stability; highest-performance kernels, which completely utilize GPU memory bandwidth; and support for multiple sparse or dense right-hand side and solution vectors. The additional memory consumption is only 5% of the original tridiagonal system, which enables the solution of systems up to GPU memory size. The performance of the state-of-the-art scalar tridiagonal solver of cuSPARSE is outperformed by factor 5 for large problem sizes of 2
25
unknowns, on a GeForce RTX 2080 Ti.
Publisher
Association for Computing Machinery (ACM)
Subject
Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software
Reference41 articles.
1. Fast k-selection algorithms for graphics processing units
2. A Study on the Implementation of Tridiagonal Systems Solvers Using a GPU
3. Li-Wen Chang. 2014. Scalable Parallel Tridiagonal Algorithms with Diagonal Pivoting and Their Optimization for Many-Core Architectures. Master’s Thesis. University of Illinois at Urbana-Champaign.
4. A scalable, numerically stable, high-performance tridiagonal solver using GPUs