Affiliation:
1. Electrical and Computer Engineering, Lakehead University, Thunder Bay, Canada
2. Electrical and Computer Engineering, Lakehead University, Thuner Bay, Canada
Abstract
Deep neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called
tensor core
(TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of the TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance.
In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called
fault detection in tensor core
(FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. The high level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with a low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs.
Publisher
Association for Computing Machinery (ACM)
Reference61 articles.
1. Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0;Muralimanohar N.;Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’07),2007
2. SCNN: An accelerator for compressed-sparse convolutional neural networks;Parashar Angshuman;Proceedings of the 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA ’17),2017
3. Eager Pruning: Algorithm and architecture support for fast training of deep neural networks;Zhang J.;Proceedings of the 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA ’19),2019
4. Argus-G: Comprehensive, Low-Cost Error Detection for GPGPU Cores
5. Modeling deep learning accelerator enabled GPUs;Raihan M. A.;Proceedings of the 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS ’19),2019