Portable Node-Level Parallelism for the PGAS Model
-
Published:2021-06-05
Issue:6
Volume:49
Page:867-885
-
ISSN:0885-7458
-
Container-title:International Journal of Parallel Programming
-
language:en
-
Short-container-title:Int J Parallel Prog
Author:
Jungblut PascalORCID, Fürlinger Karl
Abstract
AbstractThe Partitioned Global Address Space (PGAS) programming model brings intuitive shared memory semantics to distributed memory systems. Even with an abstract and unifying virtual global address space it is, however, challenging to use the full potential of different systems. Without explicit support by the implementation node-local operations have to be optimized manually for each architecture. A goal of this work is to offer a user-friendly programming model that provides portable performance across systems. In this paper we present an approach to integrate node-level programming abstractions with the PGAS programming model. We describe the hierarchical data distribution with local patterns and our implementation, MEPHISTO, in C++ using two existing projects. The evaluation of MEPHISTO shows that our approach achieves portable performance while requiring only minimal changes to port it from a CPU-based system to a GPU-based one using a CUDA or HIP back-end.
Funder
Deutsche Forschungsgemeinschaft Ludwig-Maximilians-Universität München
Publisher
Springer Science and Business Media LLC
Subject
Information Systems,Theoretical Computer Science,Software
Reference20 articles.
1. Agullo, E., Aumage, O., Faverge, M., Furmento, N., Pruvost, F., Sergent, M., Thibault, S.: Harnessing clusters of hybrid nodes with a sequential task-based programming model. In: International Workshop on Parallel Matrix Algorithms and Applications (PMAA 2014), Lugano, Switzerland (July 2014) 2. Bell, N., Hoberock, J.: Chapter 26—Thrust: a productivity-oriented library for CUDA. In: Hwu, W., Mei, W. (eds.) GPU Computing Gems Jade Edition, Applications of GPU Computing Series, pp. 359–371. Morgan Kaufmann, Boston (2012) 3. Chamberlain, B.L., Callahan, D., Zima, H.P.: Parallel programmability and the Chapel language. Int. J. High Perform. Comput. Appl. 21(3), 291–312 (2007) 4. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., Von Praun, C., Sarkar, V.: X10: an object-oriented approach to non-uniform cluster computing. ACM Sigplan Not. 40(10), 519–538 (2005) 5. Crozier, P., Plimpton, S.: miniMD v. 1.0. Technical report, Sandia National Laboratories (2009)
|
|