Efficient, portable implementation of asynchronous multi-place programs-Reference-Cited by-同舟云学术

Efficient, portable implementation of asynchronous multi-place programs

Published:2009-02-14 Issue:4 Volume:44 Page:271-282
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Bikshandi Ganesh¹,Castanos Jose G.²,Kodali Sreedhar B.¹,Nandivada V. Krishna³,Peshansky Igor⁴,Saraswat Vijay A.⁴,Sur Sayantan⁴,Varma Pradeep³,Wen Tong⁵

Affiliation:

1. IBM STG, Bangalore, India

2. IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

3. IBM India Research Lab, New Delhi, India

4. IBM T.J. Watson Research Center, Hawthorne, NY, USA

5. Interactive Supercomputing, Boston, MA, USA

Abstract

The X10 programming language is organized around the notion of places (an encapsulation of data and activities operating on the data), partitioned global address space (PGAS), and asynchronous computation and communication. This paper introduces an expressive subset of X10, Flat X10, designed to permit efficient execution across multiple single-threaded places with a simple runtime and without compromising on the productivity of X10. We present the design, implementation and evaluation of a compiler and runtime system for Flat X10. The Flat X10 compiler translates programs into C++ SPMD programs communicating using an active messaging infrastructure. It uses novel techniques to transform explicitly parallel programs into SPMD programs. The runtime system is based on IBM's LAPI (Low-level API) and is easily portable to other libraries such as GASNet and ARMCI. Our implementation realizes performance comparable to hand-written MPI programs for well-known HPC benchmarks such as Random Access, Stream, and FFT, on a Federation-based cluster of Power5 SMPs (with hundreds of processors) and the Blue Gene (with thousands of processors). Submissions based on the work presented in this paper were co-winners of the 2007 and 2008 HPC Challenge Type II Awards.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1594835.1504215

Reference19 articles.

1. Communication optimization and code generation for distributed memory machines

2. Shared memory programming for large scale machines

3. Starting with termination: A methodology for building distributed garbage collection algorithms;Blackburn Stephen M.;Aust. Comput. Sci. Commun,2001

4. Cilk

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Comprehensive Exploration of Languages for Parallel Computing;ACM Computing Surveys;2022-01-18

2. DisGCo;ACM Transactions on Architecture and Code Optimization;2020-12-31

3. Control replication;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2017-11-12

4. Optimizing recursive task parallel programs;Proceedings of the International Conference on Supercomputing - ICS '17;2017

5. Retargetable Communication for Distributed Programs;2016 12th International ACM SIGSOFT Conference on Quality of Software Architectures (QoSA);2016-04