Abstract
With the convergence of big data and HPC (high-performance computing), various machine learning applications and traditional large-scale simulations with a stochastically iterative I/O periodicity are running concurrently on HPC platforms, which poses more challenges on the scarcely shared I/O resources due to the ever-growing data transfer demand. Currently the existing heuristic online and periodic offline I/O scheduling methods for traditional HPC applications with a fixed I/O periodicity are not suitable for the applications with stochastically iterative I/O periodicities, which are required to schedule the concurrent I/Os from different applications under I/O congestion. In this work, we propose an adaptively periodic I/O scheduling (APIO) method that optimizes the system efficiency and application dilation by taking the stochastically iterative I/O periodicity of the applications into account. We first build a periodic offline scheduling method within a specified duration to capture the iterative nature. After that, APIO adjusts the bandwidth allocation to resist stochasticity based on the actual length of the computing phrase. In the case where the specified duration does not satisfy the actual running requirements, the period length will be extended to adapt to the actual duration. Theoretical analysis and extensive simulations demonstrate the efficiency of our proposed I/O scheduling method over the existing online approach.
Funder
Key-Area Research and Development Plan of Guangdong Province
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献