ALPRI-FI: A Framework for Early Assessment of Hardware Fault Resiliency of DNN Accelerators
-
Published:2024-08-15
Issue:16
Volume:13
Page:3243
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Mahmoud Karim1ORCID, Nicolici Nicola1ORCID
Affiliation:
1. Department of Electrical and Computer Engineering, McMaster University, Hamilton, ON L8S 4L8, Canada
Abstract
Understanding how faulty hardware affects machine learning models is important to both safety-critical systems and the cloud infrastructure. Since most machine learning models, like Deep Neural Networks (DNNs), are highly computationally intensive, specialized hardware accelerators are developed to improve performance and energy efficiency. Evaluating the fault resilience of these DNN accelerators during early design and implementation stages provides timely feedback, making it less costly to revise designs and address potential reliability concerns. To this end, we introduce Architecture-Level Pre-Register-Transfer-Level Implementation Fault Injection (ALPRI-FI), which is a comprehensive framework for assessing the fault resilience of DNN models deployed on hardware accelerators.
Funder
Natural Sciences and Engineering Research Council (NSERC) of Canada Innovation, Science and Economic Development Canada
Reference48 articles.
1. Capra, M., Bussolino, B., Marchisio, A., Shafique, M., Masera, G., and Martina, M. (2020). An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks. Future Internet, 12. 2. Jouppi, N.P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., and Borchers, A. (2017, January 24–28). In-Datacenter Performance Analysis of a Tensor Processing Unit. Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada. 3. Origami: A 803-GOp/s/W Convolutional Network Accelerator;Cavigelli;IEEE Trans. Circuits Syst. Video Technol. (TCSVT),2017 4. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks;Chen;IEEE J. Solid-State Circuits (JSSC),2017 5. Li, G., Hari, S.K.S., Sullivan, M., Tsai, T., Pattabiraman, K., Emer, J., and Keckler, S.W. (2017, January 12–17). Understanding error propagation in Deep Learning Neural Network (DNN) accelerators and applications. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA.
|
|