Affiliation:
1. Purdue University, West Lafayette, IN, USA
Abstract
We present the first fully automated compiler-runtime system that successfully translates and executes OpenMP shared-address-space programs on laboratory-size clusters, for the complete set of regular, repetitive applications in the NAS Parallel Benchmarks. We introduce a hybrid compiler-runtime translation scheme. Compared to previous work, this scheme features a new runtime data flow analysis and new compiler techniques for improving data affinity and reducing communication costs. We present and discuss the performance of our translated programs, and compare them with the performance of the MPI, HPF and UPC versions of the benchmarks. The results show that our translated programs achieve 75% of the hand-coded MPI programs, on average.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference19 articles.
1. Berkeley UPC - Unified Parallel C. Available at: upc.lbl.gov. Berkeley UPC - Unified Parallel C. Available at: upc.lbl.gov.
2. GCC Unified Parallel C. Available at: www.gccupc.org. GCC Unified Parallel C. Available at: www.gccupc.org.
3. UPC NAS Parallel Benchmarks from The George Washington University High Performance Computing Laboratory. Available at: threads.hpcl.gwu.edu/sites/npb-upc. UPC NAS Parallel Benchmarks from The George Washington University High Performance Computing Laboratory. Available at: threads.hpcl.gwu.edu/sites/npb-upc.
4. D. H. Bailey E. Barszcz J. T. Barton D. S. Browning R. L. Carter R. A. Fatoohi P. O. Frederickson T. A. Lasinski H. D. Simon V. Venkatakrishnan and S. K. Weeratunga. The NAS Parallel Benchmarks. 1991. D. H. Bailey E. Barszcz J. T. Barton D. S. Browning R. L. Carter R. A. Fatoohi P. O. Frederickson T. A. Lasinski H. D. Simon V. Venkatakrishnan and S. K. Weeratunga. The NAS Parallel Benchmarks. 1991.
5. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. MENPS: A Decentralized Distributed Shared Memory Exploiting RDMA;2020 IEEE/ACM Fourth Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM);2020-11
2. A constraint-based approach to automatic data partitioning for distributed memory execution;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2019-11-17
3. D2P;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2019-11-17
4. libMPNode;Proceedings of the 10th International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM'19;2019
5. HDArray: Parallel Array Interface for Distributed Heterogeneous Devices;Languages and Compilers for Parallel Computing;2019