Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs-Reference-Cited by-同舟云学术

Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs

Published:2019-10-13 Issue: Volume:2019 Page:1-13
ISSN:1687-7195
Container-title:International Journal of Reconfigurable Computing
language:en
Short-container-title:International Journal of Reconfigurable Computing

Author:

Macintosh Hamish J.¹²^ORCID,Banks Jasmine E.¹^ORCID,Kelson Neil A.²

Affiliation:

1. School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, Queensland 4001, Australia

2. eResearch Office, Division of Research and Innovation, Queensland University of Technology, Brisbane, Queensland 4001, Australia

Abstract

Solving diagonally dominant tridiagonal linear systems is a common problem in scientific high-performance computing (HPC). Furthermore, it is becoming more commonplace for HPC platforms to utilise a heterogeneous combination of computing devices. Whilst it is desirable to design faster implementations of parallel linear system solvers, power consumption concerns are increasing in priority. This work presents the oclspkt routine. The oclspkt routine is a heterogeneous OpenCL implementation of the truncated SPIKE algorithm that can use FPGAs, GPUs, and CPUs to concurrently accelerate the solving of diagonally dominant tridiagonal linear systems. The routine is designed to solve tridiagonal systems of any size and can dynamically allocate optimised workloads to each accelerator in a heterogeneous environment depending on the accelerator’s compute performance. The truncated SPIKE FPGA solver is developed first for optimising OpenCL device kernel performance, global memory bandwidth, and interleaved host to device memory transactions. The FPGA OpenCL kernel code is then refactored and optimised to best exploit the underlying architecture of the CPU and GPU. An optimised TDMA OpenCL kernel is also developed to act as a serial baseline performance comparison for the parallel truncated SPIKE kernel since no FPGA tridiagonal solver capable of solving large tridiagonal systems was available at the time of development. The individual GPU, CPU, and FPGA solvers of the oclspkt routine are 110%, 150%, and 170% faster, respectively, than comparable device-optimised third-party solvers and applicable baselines. Assessing heterogeneous combinations of compute devices, the GPU + FPGA combination is found to have the best compute performance and the FPGA-only configuration is found to have the best overall estimated energy efficiency.

Publisher

Hindawi Limited

Subject

Hardware and Architecture

Link

http://downloads.hindawi.com/journals/ijrc/2019/3679839.pdf

Reference14 articles.

1. On Stable Parallel Linear System Solvers

2. A parallel hybrid banded system solver: the SPIKE algorithm

3. SPIKE: A parallel environment for solving banded linear systems

4. Performance and Power Efficient Massive Parallel Computational Model for HPC Heterogeneous Exascale Systems

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. High throughput multidimensional tridiagonal system solvers on FPGAs;Proceedings of the 36th ACM International Conference on Supercomputing;2022-06-28

2. FPGA Acceleration of Structured-Mesh-Based Explicit and Implicit Numerical Solvers using SYCL;International Workshop on OpenCL;2022-05-10

3. Efficient Hardware Implementation of Error Correcting Codes Classification Algorithm;2021 2nd International Conference on Electronics, Communications and Information Technology (CECIT);2021-12

4. An Overview of Cyber-Physical Systems’ Hardware Architecture Concerning Machine Learning;2021 IEEE/AIAA 40th Digital Avionics Systems Conference (DASC);2021-10-03

5. Vector Operations for Accelerating Expensive Bayesian Computations – A Tutorial Guide;Bayesian Analysis;2021-01-01