Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments-Reference-Cited by-同舟云学术

Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

Published:2023-01-07 Issue:2-3 Volume:51 Page:172-185
ISSN:0885-7458
Container-title:International Journal of Parallel Programming
language:en
Short-container-title:Int J Parallel Prog

Author:

Herrmann Nina,Kuchen Herbert

Abstract

AbstractContemporary HPC hardware typically provides several levels of parallelism, e.g. multiple nodes, each having multiple cores (possibly with vectorization) and accelerators. Efficiently programming such systems usually requires skills in combining several low-level frameworks such as MPI, OpenMP, and CUDA. This overburdens programmers without substantial parallel programming skills. One way to overcome this problem and to abstract from details of parallel programming is to use algorithmic skeletons. In the present paper, we evaluate the multi-node, multi-CPU and multi-GPU implementation of the most essential skeletons Map, Reduce, and Zip. Our main contribution is a discussion of the efficiency of using multiple parallelization levels and the consideration of which fine-tune settings should be offered to the user.

Funder

Westfälische Wilhelms-Universität Münster

Publisher

Springer Science and Business Media LLC

Subject

Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s10766-022-00742-5.pdf

Reference16 articles.

1. MPI Forum. Mpi standard. https://www.mpi-forum.org/docs/ (2021). Accessed: 10.05.2021

2. OpenMP. Openmp the openmp api specification for parallel programming. https://www.openmp.org/ (2021). Accessed: 10.05.2021

3. NVIDIA Corporation. Cuda. https://developer.nvidia.com/cuda-zone (2021). Accessed: 10.05.2021

4. Cole, M.I.: Algorithmic skeletons: structured management of parallel computation. Pitman, London (1989)

5. Ernsting, S., Kuchen, H.: Algorithmic skeletons for multi-core, multi-gpu systems and clusters. Int. J. High Perform. Comput. Netw. 7(2), 129–138 (2012)