PARRAY

Author:

Chen Yifeng1,Cui Xiang1,Mei Hong2

Affiliation:

1. Peking University, Beijing, China

2. Peking University, Bejing, China

Abstract

This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU clusters. The current practice of software development requires combining several low-level libraries like Pthread, OpenMP, CUDA and MPI. Achieving productivity and portability is hard with different numbers and models of GPUs. PARRAY extends mainstream C programming with novel array types of distinct features: 1) the dimensions of an array type are nested in a tree, conceptually reflecting the memory hierarchy; 2) the definition of an array type may contain references to other array types, allowing sophisticated array types to be created for parallelization; 3) threads also form arrays that allow programming in a Single-Program-Multiple-Codeblock (SPMC) style to unify various sophisticated communication patterns. This leads to shorter, more portable and maintainable parallel codes, while the programmer still has control over performance-related features necessary for deep manual optimization. Although the source-to-source code generator only faithfully generates low-level library calls according to the type information, higher-level programming and automatic performance optimization are still possible through building libraries of sub-programs on top of PARRAY. The case study on cluster FFT illustrates a simple 30-line code that 2x outperforms Intel Cluster MKL on the Tianhe-1A system with 7168 Fermi GPUs and 14336 CPUs.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Graphics and Computer-Aided Design,Software

Reference26 articles.

1. References References

2. CUDA CUFFT Library Version 2.3. NVIDIA Corp. 2009. CUDA CUFFT Library Version 2.3. NVIDIA Corp. 2009.

3. Auto-tuning 3-D FFT library for CUDA GPUs

4. A Heterogeneous Parallel Framework for Domain-Specific Languages

5. A domain-specific approach to heterogeneous parallelism

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Distributed programming of a hyperspectral image registration algorithm for heterogeneous GPU clusters;Journal of Parallel and Distributed Computing;2021-05

2. ROS Task Scheduling Algorithm in Multi-Core System;2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS);2019-12

3. Abstract Parallel Array Types and Ghost Cell Update Implementation;Algorithms and Architectures for Parallel Processing;2018

4. High productivity multi-device exploitation with the Heterogeneous Programming Library;Journal of Parallel and Distributed Computing;2017-03

5. MARL-Ped+Hitmap: Towards Improving Agent-Based Simulations with Distributed Arrays;Algorithms and Architectures for Parallel Processing;2016

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3