Affiliation:
1. University Malaysia Pahang, College of Engineering, Department of Electrical Engineering 26300, Pahang, Malaysia
2. College of IoT Engineering, Hohai University Changzhou, Jiangsu 213022, China
Abstract
There have been an extensive use of Convolutional Neural Networks (CNNs)
in healthcare applications. Presently, GPUs are the most prominent and dominated DNN accelerators to increase the execution speed of CNN algorithms to improve their performance as well as the Latency. However, GPUs are prone to soft errors. These errors can impact the behaviors of the GPU dramatically. Thus, the generated fault may corrupt
data values or logic operations and cause errors, such as Silent Data Corruption. unfortunately, soft errors propagate from the physical level
(microarchitecture) to the application level (CNN model). This paper analyzes the reliability of the AlexNet model based on two metrics: (1) critical kernel vulnerability (CKV) used to identify the malfunction and
light- malfunction errors in each kernel, and (2) critical layer vulnerability (CLV) used to track the malfunction and light-malfunction errors through layers. To achieve this, we injected the AlexNet which was popularly used in healthcare applications on NVIDIA’s GPU, using the
SASSIFI fault injector as the major evaluator tool. The experiments demonstrate through the average error percentage that caused malfunction
of the models has been reduced from 3.7% to 0.383% by hardening only the vulnerable part with the overhead only 0.2923%. This is a high improvement in the model reliability for healthcare applications.
Publisher
Faculty of Electrical Engineering, Computer Science and Information Technology Osijek
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Smart Campus Reliability Based on the Internet of Things;International Conference on Information Systems and Intelligent Applications;2022-10-23
2. Reliability Assessment of Neural Networks in GPUs: A Framework For Permanent Faults Injections;2022 IEEE 31st International Symposium on Industrial Electronics (ISIE);2022-06-01
3. Optimizing Selective Protection for CNN Resilience;2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE);2021-10