Affiliation:
1. University of A Coruña, Spain
Abstract
Solving tridiagonal linear-equation systems is a fundamental computing kernel in a wide range of scientific and engineering applications, and its computation can be modeled with parallel algorithms. These parallel solvers are typically designed to compute problems whose data fit in a common shared-memory space where all the cores taking part in the computation have access. However, when the problem size is large, data cannot be entirely stored in the common shared-memory space, and a high number of high-latency communications are performed. One alternative is to partition the problem among different memory spaces. At this point, conventional parallel algorithms do not facilitate the partition of computation in independent tiles, since each reduction depends on equations that may be in different tiles. This article proposes an algorithm based on a tree reduction, called
the Tree Partitioning Reduction
(TPR) method, which partitions the problem into independent slices that can be partially computed in parallel within different common shared-memory spaces. The TPR method can be implemented for any parallel and distributed programming paradigm. Furthermore, in this work, TPR is efficiently implemented for CUDA GPUs to solve large size problems, providing highly competitive performance results with respect to existing packages, being, on average, 22.03× faster than CUSPARSE.
Funder
Ministry of Economy and Competitiveness of Spain and ERDF
ERDF
Government of Galicia and ERDF funds from the EU
Consolidation Programme of Competitive Reference Groups
Ministry of Education of Spain
Xunta de Galicia
Publisher
Association for Computing Machinery (ACM)
Subject
Applied Mathematics,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献