Supporting dynamic allocation of heterogeneous storage resources on HPC systems

Author:

Monniot Julien1ORCID,Tessier François1,Robert Matthieu1,Antoniu Gabriel1

Affiliation:

1. Univ Rennes, Inria, CNRS, IRISA Rennes France

Abstract

SummaryScaling up large‐scale scientific applications on supercomputing facilities is largely dependent on the ability to scale up efficiently data storage and retrieval. However, there is an ever‐widening gap between I/O and computing performance. To address this gap, an increasingly popular approach consists in introducing new intermediate storage tiers (node‐local storage, burst‐buffers,) between the compute nodes and the traditional global shared parallel file‐system. Unfortunately, without advanced techniques to allocate and size these resources, they remain underutilized. In this article, we investigate how heterogeneous storage resources can be allocated on an high‐performance computing platform, just like compute resources. To this purpose, we introduce StorAlloc, a simulator used as a testbed for assessing storage‐aware job scheduling algorithms and evaluating various storage infrastructures. We illustrate its usefulness by showing through a large series of experiments how this tool can be used to size a burst‐buffer partition on a top‐tier supercomputer by using the job history of a production year.

Publisher

Wiley

Subject

Computational Theory and Mathematics,Computer Networks and Communications,Computer Science Applications,Theoretical Computer Science,Software

Reference36 articles.

1. HenselerD LandsteinerB PeteschD WrightC WrightNJ.Architecture and design of Cray DataWarp. Proceedings of 2016 Cray User Group (CUG) Meeting;2016.

2. CornebizeT.High Performance Computing: towards Better Performance Predictions and Experiments. Theses. Université Grenoble Alpes; June 2021.https://theses.hal.science/tel‐03328956

3. MonniotJ TessierF RobertM AntoniuG.StorAlloc: a simulator for job scheduling on heterogeneous storage resources. HeteroPar;2022.https://hal.inria.fr/hal‐03683568

4. Lustre filesystem.https://www.lustre.org/

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3