FTT-NAS: Discovering Fault-tolerant Convolutional Neural Architecture

Author:

Ning Xuefei1,Ge Guangjun1,Li Wenshuo1,Zhu Zhenhua1,Zheng Yin2,Chen Xiaoming3,Gao Zhen4,Wang Yu1,Yang Huazhong1

Affiliation:

1. Department of Electronic Engineering, Tsinghua University, Beijing, China

2. Weixin Group, Tencent, Beijing, China

3. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, China

4. School of Electrical and Information Engineering, Tianjin University, China

Abstract

With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, malicious attackers, and so on. Thus, the safety risk of deploying NNs is now drawing much attention. In this article, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then, we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-18). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models.

Funder

National Natural Science Foundation of China

National Key R&D Program of China

Beijing National Research Center for Information Science and Technology

Beijing Innovation Center for Future Chips

Tsinghua University and Toyota Joint Research Center for AI Technology of Automated Vehicle

Beijing Academy of Artificial Intelligence

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications

Reference57 articles.

1. Analytical techniques for soft error rate modeling and mitigation of FPGA-based designs;Asadi Hossein;IEEE Trans. Very Large Scale Integ. Syst.,2007

2. Bowen Baker Otkrist Gupta R. Raskar and N. Naik. 2017. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823 (2017). Bowen Baker Otkrist Gupta R. Raskar and N. Naik. 2017. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823 (2017).

3. Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation

Cited by 14 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. MRFI: An Open-Source Multiresolution Fault Injection Framework for Neural Network Processing;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2024-07

2. An Overlay Accelerator of DeepLab CNN for Spacecraft Image Segmentation on FPGA;Remote Sensing;2024-03-02

3. TOSA: Tolerating Stuck-At-Faults in Edge-based RRAM Inference Accelerators;2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS);2023-12-17

4. Soft Error Reliability Analysis of Vision Transformers;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2023-12

5. Design of an experimental setup for the implementation of CNNs in APSoCs;2023 IEEE Colombian Caribbean Conference (C3);2023-11-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3