Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming-Reference-Cited by-同舟云学术

Fine-Grained Multithreading Support for Hybrid Threaded MPI Programming

Published:2010-02 Issue:1 Volume:24 Page:49-57
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Balaji Pavan¹,Buntinas Darius²,Goodell David²,Gropp William³,Thakur Rajeev²

Affiliation:

1. MATHEMATICS AND COMPUTER SCIENCE DIVISION, ARGONNE NATIONAL LABORATORY, ARGONNE, IL 60439, USA,

2. MATHEMATICS AND COMPUTER SCIENCE DIVISION, ARGONNE NATIONAL LABORATORY, ARGONNE, IL 60439, USA.

3. DEPARTMENT OF COMPUTER SCIENCE, UNIVERSITY OF ILLINOIS, URBANA, IL 61801, USA

Abstract

As high-end computing systems continue to grow in scale, recent advances in multi- and many-core architectures have pushed such growth toward more dense architectures, that is, more processing elements per physical node, rather than more physical nodes themselves. Although a large number of scientific applications have relied so far on an MPI-everywhere model for programming high-end parallel systems; this model may not be sufficient for future machines, given their physical constraints such as decreasing amounts of memory per processing element and shared caches. As a result, application and computer scientists are exploring alternative programming models that involve using MPI between address spaces and some other threaded model, such as OpenMP, Pthreads, or Intel TBB, within an address space. Such hybrid models require efficient support from an MPI implementation for MPI messages sent from multiple threads simultaneously. In this paper, we explore the issues involved in designing such an implementation. We present four approaches to building a fully thread-safe MPI implementation, with decreasing levels of critical-section granularity (from coarse-grain locks to fine-grain locks to lock-free operations) and correspondingly increasing levels of complexity. We present performance results that demonstrate the performance implications of the different approaches.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342009360206

Reference18 articles.

1. OpenMP

2. Solution of a problem in concurrent programming control

Cited by 38 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Design and performance evaluation of UCX for the Tofu Interconnect D on Fugaku towards efficient multithreaded communication;The Journal of Supercomputing;2024-06-03

2. Improving Communication Asynchrony and Concurrency for Adaptive MPI Endpoints;2022 IEEE/ACM International Workshop on Exascale MPI (ExaMPI);2022-11

3. Lessons Learned on MPI+Threads Communication;SC22: International Conference for High Performance Computing, Networking, Storage and Analysis;2022-11

4. Parallel multiphysics simulation for the stabilized Optimal Transportation Meshfree (OTM) method;Journal of Computational Science;2022-07

5. Logically Parallel Communication for Fast MPI+Threads Applications;IEEE Transactions on Parallel and Distributed Systems;2021-12-01