Abstract
AbstractTo share limited, large-capacity resources, the high-performance computing field provides services by allocating available resources to jobs through batch job schedulers. Therefore, it is natural that a queue waiting time occurs until the resources are available if resources are not sufficient. The prediction of queue waiting time is very useful to improve overall resource utilization. However, the queue waiting time is very difficult to predict because it is significantly affected by the many factors such as applied scheduling algorithm and characteristics of the executed job. In this study, a method of predicting queue waiting time using only the historical log data created by the batch job scheduler is examined. Specifically, a method of predicting queue waiting time based on a hidden Markov model is proposed. It has the following three stages. First, outliers are removed by applying the outlier detection algorithm using a statistics-based parametric method. Second, the parameters of the hidden state are estimated using the observed queue waiting time sequence based on the historical job log. Third, the queue waiting interval at time $$t+1$$
t
+
1
is provided using the estimated parameters at time t. Comparing the prediction accuracy with those of the other prediction methods, experimental results show that the proposed algorithm improves the prediction accuracy by up to 60%.
Publisher
Springer Science and Business Media LLC
Subject
Hardware and Architecture,Information Systems,Theoretical Computer Science,Software
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Quantifying Uncertainty in HPC Job Queue Time Predictions;Practice and Experience in Advanced Research Computing 2024: Human Powered Computing;2024-07-17
2. Tandem Predictions for HPC jobs;Practice and Experience in Advanced Research Computing 2024: Human Powered Computing;2024-07-17
3. Predicting accurate batch queue wait times on production supercomputers by combining machine learning techniques;Concurrency and Computation: Practice and Experience;2024-04-11
4. An Empirical Design and Implementation of Job Scheduling Enhancement for Kubernetes Clusters;2024 International Conference on Information Networking (ICOIN);2024-01-17
5. Approbation of Methods for Supercomputer Job Queue Wait Time Estimation;Lobachevskii Journal of Mathematics;2023-08