Evaluating asynchronous Schwarz solvers on GPUs-Reference-Cited by-同舟云学术

Evaluating asynchronous Schwarz solvers on GPUs

Published:2020-08-10 Issue:3 Volume:35 Page:226-236
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Nayak Pratik¹^ORCID,Cojean Terry¹,Anzt Hartwig¹²

Affiliation:

1. Karlsruhe Institute of Technology, Karlsruhe, Germany

2. University of Tennessee, Knoxville, USA

Abstract

With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel. Even a single node can contain multiple co-processors such as GPUs and multiple CPU cores. For example, ORNL’s Summit accumulates six NVIDIA Tesla V100 GPUs and 42 IBM Power9 cores on each node. Synchronizing across compute resources of multiple nodes can be prohibitively expensive. Hence, it is necessary to develop and study asynchronous algorithms that circumvent this issue of bulk-synchronous computing. In this study, we examine the asynchronous version of the abstract Restricted Additive Schwarz method as a solver. We do not explicitly synchronize, but allow the communication between the sub-domains to be completely asynchronous, thereby removing the bulk synchronous nature of the algorithm. We accomplish this by using the one-sided Remote Memory Access (RMA) functions of the MPI standard. We study the benefits of using such an asynchronous solver over its synchronous counterpart. We also study the communication patterns governed by the partitioning and the overlap between the sub-domains on the global solver. Finally, we show that this concept can render attractive performance benefits over the synchronous counterparts even for a well-balanced problem.

Funder

Helmholtz Association

u.s. department of energy

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342020946814

Reference29 articles.

1. The deal.II library, Version 9.0

2. Load-balancing Sparse Matrix Vector Product Kernels on GPUs