Fast tridiagonal solvers on the GPU-Reference-Cited by-同舟云学术

Fast tridiagonal solvers on the GPU

Published:2010-05 Issue:5 Volume:45 Page:127-136
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Zhang Yao¹,Cohen Jonathan²,Owens John D.¹

Affiliation:

1. University of California, Davis, Davis, CA, USA

2. NVIDIA, Santa Clara, CA, USA

Abstract

We study the performance of three parallel algorithms and their hybrid variants for solving tridiagonal linear systems on a GPU: cyclic reduction (CR), parallel cyclic reduction (PCR) and recursive doubling (RD). We develop an approach to measure, analyze, and optimize the performance of GPU programs in terms of memory access, computation, and control overhead. We find that CR enjoys linear algorithm complexity but suffers from more algorithmic steps and bank conflicts, while PCR and RD have fewer algorithmic steps but do more work each step. To combine the benefits of the basic algorithms, we propose hybrid CR+PCR and CR+RD algorithms, which improve the performance of PCR, RD and CR by 21%, 31% and 61% respectively. Our GPU solvers achieve up to a 28x speedup over a sequential LAPACK solver, and a 12x speedup over a multi-threaded CPU solver.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1837853.1693472

Reference31 articles.

1. General-purpose computation using graphics hardware. http://www.gpgpu.org/. General-purpose computation using graphics hardware. http://www.gpgpu.org/.

2. NVIDIA CUDA compute unified device architecture programming guide 2009. Version 2.0. NVIDIA CUDA compute unified device architecture programming guide 2009. Version 2.0.

3. Cyclic reduction on distributed shared memory machines

4. On Direct Methods for Solving Poisson’s Equations

Cited by 89 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient GPU implementation of the multivariate empirical mode decomposition algorithm;Journal of Computational Science;2023-12

2. Numerical simulation of acoustic streaming in standing waves;Computers & Mathematics with Applications;2023-12

3. Performance Tuning for GPU-Embedded Systems: Machine-Learning-Based and Analytical Model-Driven Tuning Methodologies;2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD);2023-10-17

4. The N-shaped partition method: A novel parallel implementation of the Crank Nicolson algorithm;Computer Physics Communications;2023-06

5. Efficient GPU implementation of a Boltzmann-Schrödinger-Poisson solver for the simulation of nanoscale DG MOSFETs;The Journal of Supercomputing;2023-03-23