gemV-tool: A Comprehensive Soft Error Reliability Estimation Tool for Design Space Exploration

Author:

So Hwisoo1,Ko Yohan2ORCID,Jung Jinhyo1,Lee Kyoungwoo1,Shrivastava Aviral3

Affiliation:

1. Department of Computer Science, Yonsei University, 50 Yonsei-ro, Seoul 03722, Republic of Korea

2. Division of Software, Yonsei University, 1 Yonseidae-gil, Wonju 26493, Republic of Korea

3. School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, 660, S Mill Ave, Tempe, AZ 85281, USA

Abstract

With aggressive technology scaling, soft errors have become a major threat in modern computing systems. Several techniques have been proposed in the literature and implemented in actual devices as countermeasures to this problem. However, their effectiveness in ensuring error-free computing cannot be ascertained without an accurate reliability estimation methodology. This can be achieved by using the vulnerability metric: the probability of system failure as a function of the time the program data are exposed to transient faults. In this work, we present a gemV-tool, a comprehensive toolset for estimating system vulnerability, based on the cycle-accurate gem5 simulator. The three main characteristics of the gemV-tool are: (i) fine-grained modeling: vulnerability modeling at a fine-grained granularity through the use of RTL abstraction; (ii) accurate modeling: accurate vulnerability calculation of speculatively executed instructions; and (iii) comprehensive modeling: vulnerability estimation of all the sequential elements in the out-of-order processor core. We validated our vulnerability models through extensive fault injection campaigns with <3% correlation error and 90% statistical confidence. Using the gemV-tool, we made the following observations: (i) the vulnerability of two microarchitectural configurations with similar performance can differ by 82%; (ii) the vulnerability of a processor can vary by more than 10×, depending on the implemented algorithm; and (iii) the vulnerability of each component in the processor varies significantly, depending on the ISA of the processor.

Funder

National Research Foundation of Korea

National Science Foundation

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference38 articles.

1. Soft errors in advanced computer systems;Baumann;IEEE Des. Test Comput.,2005

2. Dixit, A., and Wood, A. (2011, January 10–14). The impact of new technology on soft error rates. Proceedings of the International Reliability Physics Symposium, Monterey, CA, USA.

3. Reliability Optimization of Real-Time Satellite Embedded System Under Temperature Variations;Kim;IEEE Access,2020

4. Yoshida, J. (EE Times, 2013). Toyota Case: Single Bit Flip That Killed, EE Times.

5. Lee, I., Basoglu, M., Sullivan, M., Yoon, D.H., Kaplan, L., and Erez, M. (2011). Survey of Error and Fault Detection Mechanisms, University of Texas. Technical Report.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Exploring the Behavior of Soft-Error Rate Reduction Algorithms in Digital Circuits;2024 International Conference on Optimization Computing and Wireless Communication (ICOCWC);2024-01-29

2. Microarchitecturally Exploring Fault-Tolerance and Timing on Silicon on Chip;2024 International Conference on Optimization Computing and Wireless Communication (ICOCWC);2024-01-29

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3