Optimizing Efficiency of Machine Learning Based Hard Disk Failure Prediction by Two-Layer Classification-Based Feature Selection-Reference-Cited by-同舟云学术

Optimizing Efficiency of Machine Learning Based Hard Disk Failure Prediction by Two-Layer Classification-Based Feature Selection

Published:2023-06-26 Issue:13 Volume:13 Page:7544
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Wang Han¹^ORCID,Zhuge Qingfeng¹,Sha Edwin Hsing-Mean¹,Xu Rui¹,Song Yuhong¹

Affiliation:

1. School of Computer Science and Technology, East China Normal University, Shanghai 200063, China

Abstract

Predicting hard disk failure effectively and efficiently can prevent the high costs of data loss for data storage systems. Disk failure prediction based on machine learning and artificial intelligence has gained notable attention, because of its good capabilities. Improving the accuracy and performance of disk failure prediction, however, is still a challenging problem. When disk failure is about to occur, the time is limited for the prediction process, including building models and predicting. Faster training would promote the efficiency of model updates, and late predictions not only have no value but also waste resources. To improve both the prediction quality and modeling timeliness, a two-layer classification-based feature selection scheme is proposed in this paper. An attribute filter calculating the importance of attributes was designed, to remove attributes insensitive to failure identification, where importance is gained based on the idea of classification tree models. Furthermore, by determining the correlation between features based on the correlation coefficient, an attribute classification method is proposed. In experiments, the models of machine learning and artificial intelligence were applied, and they included naïve Bayesian, random forest, support vector machine, gradient boosted decision tree, convolutional neural networks, and long short-term memory. The results showed that the proposed technique could improve the prediction accuracy of ML/AI-based hard disk failure prediction models. Specifically, utilizing random forest and long short-term memory with the proposed technique showed the best accuracy. Meanwhile, the proposed scheme could reduce training and prediction latency by 75% and 83%, respectively, in the best case compared with the baseline methods.

Funder

NSFC

Shanghai Science and Technology Commission Project

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/13/7544/pdf

Reference33 articles.

1. Wang, G., Zhang, L., and Xu, W. (2017, January 26–29). What can we learn from four years of data center hardware failures?. Proceedings of the DSN, Denver, CO, USA.

2. Ghemawat, S., Gobioff, H., and Leung, S.T. (2003, January 19–22). The Google file system. Proceedings of the SOSP, New York, NY, USA.

3. Huang, C., Simitci, H., Xu, Y., Ogus, A., Calder, B., Gopalan, P., Li, J., and Yekhanin, S. (2012, January 13–15). Erasure coding in windows azure storage. Proceedings of the ATC, Boston, MA, USA.

4. Patterson, D.A., Gibson, G., and Katz, R.H. (1988, January 1–3). A case for redundant arrays of inexpensive disks (RAID). Proceedings of the SIGMOD, Chicago, IL, USA.

5. Li, J., Stones, R.J., Wang, G., Li, Z., Liu, X., and Xiao, K. (2016, January 26–29). Being accurate is not enough: New metrics for disk failure prediction. Proceedings of the SRDS, Budapest, Hungary.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automated Machine Learning in Waste Classification: A Revolutionary Approach to Efficiency and Accuracy;2023 12th International Conference on Computing and Pattern Recognition;2023-10-27