Portable Node-Level Parallelism for the PGAS Model-Reference-Cited by-同舟云学术

Portable Node-Level Parallelism for the PGAS Model

Published:2021-06-05 Issue:6 Volume:49 Page:867-885
ISSN:0885-7458
Container-title:International Journal of Parallel Programming
language:en
Short-container-title:Int J Parallel Prog

Author:

Jungblut Pascal^ORCID,Fürlinger Karl

Abstract

AbstractThe Partitioned Global Address Space (PGAS) programming model brings intuitive shared memory semantics to distributed memory systems. Even with an abstract and unifying virtual global address space it is, however, challenging to use the full potential of different systems. Without explicit support by the implementation node-local operations have to be optimized manually for each architecture. A goal of this work is to offer a user-friendly programming model that provides portable performance across systems. In this paper we present an approach to integrate node-level programming abstractions with the PGAS programming model. We describe the hierarchical data distribution with local patterns and our implementation, MEPHISTO, in C++ using two existing projects. The evaluation of MEPHISTO shows that our approach achieves portable performance while requiring only minimal changes to port it from a CPU-based system to a GPU-based one using a CUDA or HIP back-end.

Funder

Deutsche Forschungsgemeinschaft

Ludwig-Maximilians-Universität München

Publisher

Springer Science and Business Media LLC

Subject

Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s10766-021-00718-x.pdf

Reference20 articles.

1. Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., Thibault, S.: Harnessing clusters of hybrid nodes with a sequential task-based programming model. In: International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2014), Lugano, Switzerland (July 2014)

2. Bell, N., Hoberock, J.: Chapter 26—Thrust: a productivity-oriented library for CUDA. In: Hwu, W., Mei, W. (eds.) GPU Computing Gems Jade Edition, Applications of GPU Computing Series, pp. 359–371. Morgan Kaufmann, Boston (2012)

3. Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007)

4. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. ACM Sigplan Not. 40(10), 519–538 (2005)

5. Crozier, P., Plimpton, S.: miniMD v. 1.0. Technical report, Sandia National Laboratories (2009)