A Survey on Malleability Solutions for High-Performance Distributed Computing-Reference-Cited by-同舟云学术

A Survey on Malleability Solutions for High-Performance Distributed Computing

Published:2022-05-22 Issue:10 Volume:12 Page:5231
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Aliaga Jose I.^ORCID,Castillo Maribel,Iserte Sergio^ORCID,Martín-Álvarez Iker^ORCID,Mayo Rafael

Abstract

Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale supercomputers. Process malleability is presented as a straightforward mechanism to address that issue. Nowadays, the vast majority of HPC facilities are intended for distributed-memory applications based on the Message Passing (MP) paradigm. For this reason, many efforts are based on the Message Passing Interface (MPI), the de facto standard programming model. Malleability aims to rescale executions on-the-fly, in other words, reconfigure the number and layout of processes in running applications. Process malleability involves resources reallocation within the HPC system, handling processes of the application, and redistributing data among those processes to resume the execution. This manuscript compiles how different frameworks address process malleability, their main features, their integration in resource management systems, and how they may be used in user codes. This paper is a detailed state-of-the-art devised as an entry point for researchers who are interested in process malleability.

Funder

Ministerio de Ciencia e Innovación

Valencian Region Government and European Social Funds

Universitat Jaume I

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/10/5231/pdf

Reference77 articles.

1. TOP500

2. Exascale workload characterization and architecture implications

3. Fault tolerance of MPI applications in exascale systems: The ULFM solution

4. A survey of MPI usage in the US exascale computing project

5. Toward Convergence in Job Schedulers for Parallel Supercomputers;Feitelson,1996

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Dynamic spawning of MPI processes applied to malleability;The International Journal of High Performance Computing Applications;2023-05-29

2. Configurable synthetic application for studying malleability in HPC;2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP);2023-03

3. Probabilistic Job History Conversion and Performance Model Generation for Malleable Scheduling Simulations;Lecture Notes in Computer Science;2023

4. A Case Study on PMIx-Usage for Dynamic Resource Management;Lecture Notes in Computer Science;2023