Examining the impact of critical attributes on hard drive failure times: Multi‐state models for left‐truncated and right‐censored semi‐competing risks data

Author:

Oakley Jordan L.1,Forshaw Matthew2,Philipson Pete1,Wilson Kevin J.1

Affiliation:

1. School of Mathematics, Statistics & Physics Newcastle University Newcastle upon Tyne UK

2. School of Computing Newcastle University Newcastle upon Tyne UK

Abstract

AbstractThe ability to predict failures in hard disk drives (HDDs) is a major objective of HDD manufacturers since avoiding unexpected failures may prevent data loss, improve service reliability, and reduce data center downtime. Most HDDs are equipped with a threshold‐based monitoring system named self‐monitoring, analysis and reporting technology (SMART). The system collects several performance metrics, called SMART attributes, and detects anomalies that may indicate incipient failures. SMART works as a nascent failure detection method and does not estimate the HDDs' remaining useful life. We define critical attributes and critical states for hard drives using SMART attributes and fit multi‐state models to the resulting semi‐competing risks data. The multi‐state models provide a coherent and novel way to model the failure time of a hard drive and allow us to examine the impact of critical attributes on the failure time of a hard drive. We derive dynamic predictions of conditional survival probabilities, which are adaptive to the state of the drive. Using a dataset of HDDs equipped with SMART, we find that drives are more likely to fail after entering critical states. We evaluate the predictive accuracy of the proposed models with a case study of HDDs equipped with SMART, using the time‐dependent area under the receiver operating characteristic curve (AUC) and the expected prediction error (PE). The results suggest that accounting for changes in the critical attributes improves the accuracy of dynamic predictions.

Funder

Engineering and Physical Sciences Research Council

Publisher

Wiley

Subject

Management Science and Operations Research,General Business, Management and Accounting,Modeling and Simulation

Reference40 articles.

1. Field-Failure Predictions Based on Failure-Time Data With Dynamic Covariate Information

2. Characterizing cloud computing hardware reliability

3. Machine learning methods for predicting failures in hard drives: a multiple‐instance application;Murray JF;J Mach Learn Res,2005

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3