Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster-Reference-Cited by-同舟云学术

Evaluation of XcalableACC with tightly coupled accelerators/InfiniBand hybrid communication on accelerated cluster

Published:2019-01-03 Issue:5 Volume:33 Page:869-884
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Nakao Masahiro¹,Odajima Tetsuya¹,Murai Hitoshi¹,Tabuchi Akihiro²,Fujita Norihisa³,Hanawa Toshihiro⁴,Boku Taisuke⁵³,Sato Mitsuhisa¹

Affiliation:

1. RIKEN Center for Computational Science, Kobe, Japan

2. Fujitsu Laboratories Ltd, Kawasaki, Japan

3. Center for Computational Sciences, University of Tsukuba, Tsukuba, Japan

4. Information Technology Center, The University of Tokyo, Tokyo, Japan

5. Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Japan

Abstract

Accelerated clusters, which are cluster systems equipped with accelerators, are one of the most common systems in parallel computing. In order to exploit the performance of such systems, it is important to reduce communication latency between accelerator memories. In addition, there is also a need for a programming language that facilitates the maintenance of high performance by such systems. The goal of the present article is to evaluate XcalableACC (XACC), a parallel programming language, with tightly coupled accelerators/InfiniBand (TCAs/IB) hybrid communication on an accelerated cluster. TCA/IB hybrid communication is a combination of low-latency communication with TCA and high bandwidth with IB. The XACC language, which is a directive-based language for accelerated clusters, enables programmers to use TCA/IB hybrid communication with ease. In order to evaluate the performance of XACC with TCA/IB hybrid communication, we implemented the lattice quantum chromodynamics (LQCD) mini-application and evaluated the application on our accelerated cluster using up to 64 compute nodes. We also implemented the LQCD mini-application using a combination of CUDA and MPI (CUDA + MPI) and that of OpenACC and MPI (OpenACC + MPI) for comparison with XACC. Performance evaluation revealed that the performance of XACC with TCA/IB hybrid communication is 9% better than that of CUDA + MPI and 18% better than that of OpenACC + MPI. Furthermore, the performance of XACC was found to further increase by 7% by new expansion to XACC. Productivity evaluation revealed that XACC requires much less change from the serial LQCD code to implement the parallel LQCD code than CUDA + MPI and OpenACC + MPI. Moreover, since XACC can perform parallelization while maintaining the sequential code image, XACC is highly readable and shows excellent portability due to its directive-based approach.

Funder

Core Research for Evolutional Science and Technology

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342018821163

Reference21 articles.

1. APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters

2. GPU Peer-to-Peer Techniques Applied to a Cluster Interconnect

3. Kokkos: Enabling Performance Portability Across Manycore Architectures

4. Tightly Coupled Accelerators Architecture for Minimizing Communication Latency among Accelerators

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An intelligent memory caching architecture for data-intensive multimedia applications;Multimedia Tools and Applications;2020-03-26