Affiliation:
1. Univ Rennes, Inria, CNRS, IRISA Rennes France
Abstract
SummaryScaling up large‐scale scientific applications on supercomputing facilities is largely dependent on the ability to scale up efficiently data storage and retrieval. However, there is an ever‐widening gap between I/O and computing performance. To address this gap, an increasingly popular approach consists in introducing new intermediate storage tiers (node‐local storage, burst‐buffers,) between the compute nodes and the traditional global shared parallel file‐system. Unfortunately, without advanced techniques to allocate and size these resources, they remain underutilized. In this article, we investigate how heterogeneous storage resources can be allocated on an high‐performance computing platform, just like compute resources. To this purpose, we introduce StorAlloc, a simulator used as a testbed for assessing storage‐aware job scheduling algorithms and evaluating various storage infrastructures. We illustrate its usefulness by showing through a large series of experiments how this tool can be used to size a burst‐buffer partition on a top‐tier supercomputer by using the job history of a production year.
Subject
Computational Theory and Mathematics,Computer Networks and Communications,Computer Science Applications,Theoretical Computer Science,Software
Reference36 articles.
1. HenselerD LandsteinerB PeteschD WrightC WrightNJ.Architecture and design of Cray DataWarp. Proceedings of 2016 Cray User Group (CUG) Meeting;2016.
2. CornebizeT.High Performance Computing: towards Better Performance Predictions and Experiments. Theses. Université Grenoble Alpes; June 2021.https://theses.hal.science/tel‐03328956
3. MonniotJ TessierF RobertM AntoniuG.StorAlloc: a simulator for job scheduling on heterogeneous storage resources. HeteroPar;2022.https://hal.inria.fr/hal‐03683568
4. Lustre filesystem.https://www.lustre.org/