Reliability and Performance Analysis of Architecture-Based Software Implementing Restarts and Retries Subject to Correlated Component Failures

Author:

Li Xiao-Dan1,Yin Yong-Feng1,Fiondella Lance2

Affiliation:

1. School of Reliability and System Engineering, Beihang University, Beijng 100191, P. R. China

2. Department of Electrical and Computer Engineering, University of Massachusetts, North Dartmouth, MA 02747–2300, USA

Abstract

High reliability and performance are essential attributes of software systems designed for critical real-time applications. To improve the reliability and performance of software, many systems incorporate some form of fault recovery mechanism. However, contemporary models of software reliability and performance rarely consider these fault recovery mechanisms. Another notable shortcoming of many software models is that they make the simplifying assumption that component failures are statistically independent, which disagrees with several experimental studies that have shown that the failures of software components can exhibit correlation. This paper presents an architecture-based model of software reliability and performance that explicitly considers a two-stage fault recovery mechanism implementing component restarts and application-level retries. The application architecture is characterized by a Discrete Time Markov Chain (DTMC) to represent the dynamic branching behavior of control between the components of the application. Correlations between the component failures are computed with an efficient numerical algorithm for a multivariate Bernoulli (MVB) distribution. We illustrate the utility of the model through a case study of an embedded software application. The results suggest that the model can be used to quantify the impact of software fault recovery and correlated component failures on application reliability and performance.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Performability modeling of safety-critical systems through AADL;International Journal of Information Technology;2022-06-04

2. Some Studies on Performability Analysis of Safety Critical Systems;Computer Science Review;2021-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3