A Comparative Study of Block Incomplete Sparse Approximate Inverses Preconditioning on Tesla K20 and V100 GPUs
-
Published:2021-06-30
Issue:7
Volume:14
Page:204
-
ISSN:1999-4893
-
Container-title:Algorithms
-
language:en
-
Short-container-title:Algorithms
Author:
Ma WenpengORCID,
Yuan Wu,
Liu Xiazhen
Abstract
Incomplete Sparse Approximate Inverses (ISAI) has shown some advantages over sparse triangular solves on GPUs when it is used for the incomplete LU based preconditioner. In this paper, we extend the single GPU method for Block–ISAI to multiple GPUs algorithm by coupling Block–Jacobi preconditioner, and introduce the detailed implementation in the open source numerical package PETSc. In the experiments, two representative cases are performed and a comparative study of Block–ISAI on up to four GPUs are conducted on two major generations of NVIDIA’s GPUs (Tesla K20 and Tesla V100). Block–Jacobi preconditioning with Block–ISAI (BJPB-ISAI) shows an advantage over the level-scheduling based triangular solves from the cuSPARSE library for the cases, and the overhead of setting up Block–ISAI and the total wall clock times of GMRES is greatly reduced using Tesla V100 GPUs compared to Tesla K20 GPUs.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Innovation Team Support Plan of University Science and Technology of Henan Province
Subject
Computational Mathematics,Computational Theory and Mathematics,Numerical Analysis,Theoretical Computer Science
Reference32 articles.
1. Iterative Methods for Sparse Linear Systems;Saad,2003
2. Parallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU
https://research.nvidia.com/sites/default/files/pubs/2011-06_Parallel-Solution-of/nvr-2011-001.pdf
3. Compute Unified Device Architecture
https://developer.nvidia.com/cuda-toolkit
4. Pushing memory bandwidth limitations through efficient implementations of Block-Krylov space solvers on GPUs