FTT-NAS: Discovering Fault-tolerant Convolutional Neural Architecture-Reference-Cited by-同舟云学术

FTT-NAS: Discovering Fault-tolerant Convolutional Neural Architecture

Published:2021-11-30 Issue:6 Volume:26 Page:1-24
ISSN:1084-4309
Container-title:ACM Transactions on Design Automation of Electronic Systems
language:en
Short-container-title:ACM Trans. Des. Autom. Electron. Syst.

Author:

Ning Xuefei¹,Ge Guangjun¹,Li Wenshuo¹,Zhu Zhenhua¹,Zheng Yin²,Chen Xiaoming³,Gao Zhen⁴,Wang Yu¹,Yang Huazhong¹

Affiliation:

1. Department of Electronic Engineering, Tsinghua University, Beijing, China

2. Weixin Group, Tencent, Beijing, China

3. State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, China

4. School of Electrical and Information Engineering, Tianjin University, China

Abstract

With the fast evolvement of embedded deep-learning computing systems, applications powered by deep learning are moving from the cloud to the edge. When deploying neural networks (NNs) onto the devices under complex environments, there are various types of possible faults: soft errors caused by cosmic radiation and radioactive impurities, voltage instability, aging, temperature variations, malicious attackers, and so on. Thus, the safety risk of deploying NNs is now drawing much attention. In this article, after the analysis of the possible faults in various types of NN accelerators, we formalize and implement various fault models from the algorithmic perspective. We propose Fault-Tolerant Neural Architecture Search (FT-NAS) to automatically discover convolutional neural network (CNN) architectures that are reliable to various faults in nowadays devices. Then, we incorporate fault-tolerant training (FTT) in the search process to achieve better results, which is referred to as FTT-NAS. Experiments on CIFAR-10 show that the discovered architectures outperform other manually designed baseline architectures significantly, with comparable or fewer floating-point operations (FLOPs) and parameters. Specifically, with the same fault settings, F-FTT-Net discovered under the feature fault model achieves an accuracy of 86.2% (VS. 68.1% achieved by MobileNet-V2), and W-FTT-Net discovered under the weight fault model achieves an accuracy of 69.6% (VS. 60.8% achieved by ResNet-18). By inspecting the discovered architectures, we find that the operation primitives, the weight quantization range, the capacity of the model, and the connection pattern have influences on the fault resilience capability of NN models.

Funder

National Natural Science Foundation of China

National Key R&D Program of China

Beijing National Research Center for Information Science and Technology

Beijing Innovation Center for Future Chips

Tsinghua University and Toyota Joint Research Center for AI Technology of Automated Vehicle

Beijing Academy of Artificial Intelligence

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications

Link

https://dl.acm.org/doi/pdf/10.1145/3460288

Reference57 articles.

1. Analytical techniques for soft error rate modeling and mitigation of FPGA-based designs;Asadi Hossein;IEEE Trans. Very Large Scale Integ. Syst.,2007

2. Bowen Baker Otkrist Gupta R. Raskar and N. Naik. 2017. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823 (2017). Bowen Baker Otkrist Gupta R. Raskar and N. Naik. 2017. Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823 (2017).

3. Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MRFI: An Open-Source Multiresolution Fault Injection Framework for Neural Network Processing;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2024-07

2. An Overlay Accelerator of DeepLab CNN for Spacecraft Image Segmentation on FPGA;Remote Sensing;2024-03-02

3. TOSA: Tolerating Stuck-At-Faults in Edge-based RRAM Inference Accelerators;2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS);2023-12-17

4. Soft Error Reliability Analysis of Vision Transformers;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2023-12

5. Design of an experimental setup for the implementation of CNNs in APSoCs;2023 IEEE Colombian Caribbean Conference (C3);2023-11-22