Affiliation:
1. Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China
Abstract
Disk failure has always been a major problem for data centers, leading to data loss. Current disk failure prediction approaches are mostly offline and assume that the disk labels required for training learning models are available and accurate. However, these offline methods are no longer suitable for disk failure prediction tasks in large-scale data centers. Behind this explosive amount of data, most methods do not consider whether it is not easy to get the label values during the training or the obtained label values are not completely accurate. These problems further restrict the development of supervised learning and offline modeling in disk failure prediction. In this article, Active Semi-supervised Learning Disk-failure Prediction (
ASLDP
), a novel disk failure prediction method is proposed, which uses active learning and semi-supervised learning. According to the characteristics of data in the disk lifecycle,
ASLDP
carries out active learning for those clear labeled samples, which selects valuable samples with the most significant probability uncertainty and eliminates redundancy. For those samples that are unclearly labeled or unlabeled,
ASLDP
uses semi-supervised learning for pre-labeled by calculating the conditional values of the samples and enhances the generalization ability by active learning. Compared with several state-of-the-art offline and online learning approaches, the results on four realistic datasets from Backblaze and Baidu demonstrate that
ASLDP
achieves stable failure detection rates of 80–85% with low false alarm rates. In addition, we use a dataset from Alibaba to evaluate the generality of
ASLDP
. Furthermore,
ASLDP
can overcome the problem of missing sample labels and data redundancy in large data centers, which are not considered and implemented in all offline learning methods for disk failure prediction to the best of our knowledge. Finally,
ASLDP
can predict the disk failure 4.9 days in advance with lower overhead and latency.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture
Reference68 articles.
1. Monitoring hard disks with SMART;Allen Bruce;Linux J.,2004
2. Large Scale Predictive Analytics for Hard Disk Remaining Useful Life Estimation
3. Backblaze. 2014. Hard Drive SMART Stats. Retrieved from https://www.backblaze.com/blog/hard-drive-smart-stats/.
4. Backblaze. 2015. What Is the Best Hard Drive? Retrieved from https://www.backblaze.com/blog/best-hard-drive-q4-2014/.
5. Backblaze. 2016–2020. Raw Hard Drive Test Data. Retrieved from https://www.backblaze.com/b2/hard-drive-test-data.html.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. The DEBS 2024 Grand Challenge: Telemetry Data for Hard Drive Failure Prediction;Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems;2024-06-24
2. ACPR: Adaptive Classification Predictive Repair Method for Different Fault Scenarios;IEEE Access;2024
3. 一个针对多种问题的磁盘故障预测模型;Frontiers of Information Technology & Electronic Engineering;2023-07