BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks-Reference-Cited by-同舟云学术

BitSET: Bit-Serial Early Termination for Computation Reduction in Convolutional Neural Networks

Published:2023-09-09 Issue:5s Volume:22 Page:1-24
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Pan Yunjie¹^ORCID,Yu Jiecao²^ORCID,Lukefahr Andrew³^ORCID,Das Reetuparna¹^ORCID,Mahlke Scott¹^ORCID

Affiliation:

1. University of Michigan, USA

2. Facebook Inc., USA

3. Indiana University, USA

Abstract

Convolutional Neural Networks (CNNs) have demonstrated remarkable performance across a wide range of machine learning tasks. However, the high accuracy usually comes at the cost of substantial computation and energy consumption, making it difficult to be deployed on mobile and embedded devices. In CNNs, the compute-intensive convolutional layers are usually followed by a ReLU activation layer, which clamps negative outputs to zeros, resulting in large activation sparsity. By exploiting such sparsity in CNN models, we propose a software-hardware co-design BitSET, that aggressively saves energy during CNN inference. The bit-serial BitSET accelerator adopts a prediction-based bit-level early termination technique that terminates the ineffectual computation of negative outputs early. To assist the algorithm, we propose a novel weight encoding that allows more accurate predictions with fewer bits. BitSET leverages the bit-level computation reduction both in the predictive early termination algorithm and in the non-predictive, energy-efficient bit-serial architecture. Compared to UNPU, an energy-efficient bit-serial CNN accelerator, BitSET yields an average 1.5× speedup and 1.4× energy efficiency improvement with no accuracy loss due to a 48% reduction in bit-level computations. Relaxing the allowed accuracy loss to 1% increases the gains to an average of 1.6× speedup and 1.4× energy efficiency improvement.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3609093

Reference60 articles.

1. Vahideh Akhlaghi, Amir Yazdanbakhsh, Kambiz Samadi, Rajesh K. Gupta, and Hadi Esmaeilzadeh. 2018. Snapea: Predictive early activation for reducing computation in deep convolutional neural networks. In ISCA’18. IEEE, 662–673.

2. Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O’Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic deep neural network computing. In MICRO’17. 382–394.

3. Moez Baccouche, Franck Mamalet, Christian Wolf, Christophe Garcia, and Atilla Baskurt. 2011. Sequential deep learning for human action recognition. In International Workshop on Human Behavior Understanding. Springer, 29–39.

4. Tolga Bolukbasi, Joseph Wang, Ofer Dekel, and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. In International Conference on Machine Learning. PMLR, 527–536.

5. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks;Chen Yu-Hsin;JSSC,2016