Demystifying asynchronous I/O Interference in HPC applications-Reference-Cited by-同舟云学术

Demystifying asynchronous I/O Interference in HPC applications

Published:2021-05-13 Issue:4 Volume:35 Page:391-412
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Tseng Shu-Mei¹,Nicolae Bogdan²^ORCID,Cappello Franck²,Chandramowlishwaran Aparna¹

Affiliation:

1. EECS, University of California Irvine, California, USA

2. MCS, Argonne National Laboratory, Illinois, USA

Abstract

With increasing complexity of HPC workflows, data management services need to perform expensive I/O operations asynchronously in the background, aiming to overlap the I/O with the application runtime. However, this may cause interference due to competition for resources: CPU, memory/network bandwidth. The advent of multi-core architectures has exacerbated this problem, as many I/O operations are issued concurrently, thereby competing not only with the application but also among themselves. Furthermore, the interference patterns can dynamically change as a response to variations in application behavior and I/O subsystems (e.g. multiple users sharing a parallel file system). Without a thorough understanding, I/O operations may perform suboptimally, potentially even worse than in the blocking case. To fill this gap, this paper investigates the causes and consequences of interference due to asynchronous I/O on HPC systems. Specifically, we focus on multi-core CPUs and memory bandwidth, isolating the interference due to each resource. Then, we perform an in-depth study to explain the interplay and contention in a variety of resource sharing scenarios such as varying priority and number of background I/O threads and different I/O strategies: sendfile, read/write, mmap/write underlining trade-offs. The insights from this study are important both to enable guided optimizations of existing background I/O, as well as to open new opportunities to design advanced asynchronous I/O strategies.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/10943420211016511

Reference39 articles.

1. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

2. Rucio: Scientific Data Management

3. Storage challenges at Los Alamos National Lab

4. Understanding and Improving Computational Science Storage Access through Continuous Characterization

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multifacets of lossy compression for scientific data in the Joint-Laboratory of Extreme Scale Computing;Future Generation Computer Systems;2025-02

2. Scalable I/O aggregation for asynchronous multi-level checkpointing;Future Generation Computer Systems;2024-11

3. Concealing Compression-accelerated I/O for HPC Applications through In Situ Task Scheduling;Proceedings of the Nineteenth European Conference on Computer Systems;2024-04-22

4. Modeling Multi-Threaded Aggregated I/O for Asynchronous Checkpointing on HPC Systems;2023 22nd International Symposium on Parallel and Distributed Computing (ISPDC);2023-07

5. Conquering Noise With Hardware Counters on HPC Systems;2022 IEEE/ACM Workshop on Programming and Performance Visualization Tools (ProTools);2022-11