Exploiting copy engines for intra-node MPI collective communication

Author:

Cho Joong-Yeon,Seo Pu-Rum,Jin Hyun-Wook

Abstract

AbstractAs multi/many-core processors are widely deployed in high-performance computing systems, efficient intra-node communication becomes more important. Intra-node communication involves data copy operations to move messages from source to destination buffer. Researchers have tried to reduce the overhead of this copy operation, but the copy operation performed by CPU still wastes the CPU resources and even hinders overlapping between computation and communication. The copy engine is a hardware component that can move data between intra-node buffers without intervention of CPU. Thus, we can offload the copy operation performed by CPU onto the copy engine. In this paper, we aim at exploiting copy engines for MPI blocking collective communication, such as broadcast and gather operations. MPI is a messaging-based parallel programming model and provides point-to-point, collective, and one-sided communications. Research has been conducted to utilize the copy engine for MPI, but the support for collective communication has not yet been studied. We propose the asynchronism in blocking collective communication and the CE-CPU hybrid approach to utilize both copy engine and CPU for intra-node collective communication. The measurement results show that the proposed approach can reduce the overall execution time of a microbenchmark and a synthetic application that perform collective communication and computation up to 72% and 57%, respectively.

Funder

Ministry of Science and ICT, South Korea

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems,Theoretical Computer Science,Software

Reference33 articles.

1. Message Passing Interface. https://www.mpi-forum.org/. Accessed 26 Feb 2023

2. MPICH: high-performance portable MPI. https://www.mpich.org/. Accessed 26 Feb 2023

3. MVAPICH: MPI over infiniBand, omni-path, ethernet/iWARP, and RoCE. http://mvapich.cse.ohio-state.edu/. Accessed 26 Feb 2023

4. Open MPI: open source high performance computing. https://www.open-mpi.org/. Accessed 26 Feb 2023

5. Chai L, Hartono A, Panda DK Designing high performance and scalable mpi intra-node communication support for clusters. In: 2006 IEEE International Conference on Cluster Computing, pp. 1–10 (2006). IEEE

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3