PARRAY-Reference-Cited by-同舟云学术

PARRAY

Published:2012-09-11 Issue:8 Volume:47 Page:171-180
ISSN:0362-1340
Container-title:ACM SIGPLAN Notices
language:en
Short-container-title:SIGPLAN Not.

Author:

Chen Yifeng¹,Cui Xiang¹,Mei Hong²

Affiliation:

1. Peking University, Beijing, China

2. Peking University, Bejing, China

Abstract

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends mainstream C programming with novel array types of distinct features: 1) the dimensions of an array type are nested in a tree, conceptually reflecting the memory hierarchy; 2) the definition of an array type may contain references to other array types, allowing sophisticated array types to be created for parallelization; 3) threads also form arrays that allow programming in a Single-Program-Multiple-Codeblock (SPMC) style to unify various sophisticated communication patterns. This leads to shorter, more portable and maintainable parallel codes, while the programmer still has control over performance-related features necessary for deep manual optimization. Although the source-to-source code generator only faithfully generates low-level library calls according to the type information, higher-level programming and automatic performance optimization are still possible through building libraries of sub-programs on top of PARRAY. The case study on cluster FFT illustrates a simple 30-line code that 2x outperforms Intel Cluster MKL on the Tianhe-1A system with 7168 Fermi GPUs and 14336 CPUs.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Link

https://dl.acm.org/doi/pdf/10.1145/2370036.2145838

Reference26 articles.

1. References References

2. CUDA CUFFT Library Version 2.3. NVIDIA Corp. 2009. CUDA CUFFT Library Version 2.3. NVIDIA Corp. 2009.

3. Auto-tuning 3-D FFT library for CUDA GPUs

4. A Heterogeneous Parallel Framework for Domain-Specific Languages

5. A domain-specific approach to heterogeneous parallelism

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Distributed programming of a hyperspectral image registration algorithm for heterogeneous GPU clusters;Journal of Parallel and Distributed Computing;2021-05

2. ROS Task Scheduling Algorithm in Multi-Core System;2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS);2019-12

3. Abstract Parallel Array Types and Ghost Cell Update Implementation;Algorithms and Architectures for Parallel Processing;2018

4. High productivity multi-device exploitation with the Heterogeneous Programming Library;Journal of Parallel and Distributed Computing;2017-03

5. MARL-Ped+Hitmap: Towards Improving Agent-Based Simulations with Distributed Arrays;Algorithms and Architectures for Parallel Processing;2016