SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme-Reference-Cited by-同舟云学术

SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme

Published:2022-11-06 Issue:21 Volume:22 Page:8545
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Chang Liang^ORCID,Zhao Xin,Zhou Jun

Abstract

Deep neural networks have been deployed in various hardware accelerators, such as graph process units (GPUs), field-program gate arrays (FPGAs), and application specific integrated circuit (ASIC) chips. Normally, a huge amount of computation is required in the inference process, creating significant logic resource overheads. In addition, frequent data accessions between off-chip memory and hardware accelerators create bottlenecks, leading to decline in hardware efficiency. Many solutions have been proposed to reduce hardware overhead and data movements. For example, specific lookup-table (LUT)-based hardware architecture can be used to mitigate computing operation demands. However, typical LUT-based accelerators are affected by computational precision limitation and poor scalability issues. In this paper, we propose a search-based computing scheme based on an LUT solution, which improves computation efficiency by replacing traditional multiplication with a search operation. In addition, the proposed scheme supports different precision multiple-bit widths to meet the needs of different DNN-based applications. We design a reconfigurable computing strategy, which can efficiently adapt to the convolution of different kernel sizes to improve hardware scalability. We implement a search-based architecture, namely SCA, which adopts an on-chip storage mechanism, thus greatly reducing interactions with off-chip memory and alleviating bandwidth pressure. Based on experimental evaluation, the proposed SCA architecture can achieve 92%, 96% and 98% computational utilization for computational precision of 4 bit, 8 bit and 16 bit, respectively. Compared with state-of-the-art LUT-based architecture, the efficiency can be improved four-fold.

Funder

National Safety Academic Fund

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/21/8545/pdf

Reference38 articles.

1. Metwaly, K., Kim, A., Branson, E., and Monga, V. Glidenet: Global, local and intrinsic based dense embedding network for multi-category attributes prediction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

2. Weighted feature fusion of convolutional neural network and graph attention network for hyperspectral image classification;Dong;IEEE Trans. Image Process.,2022

3. Li, W., Chen, Y., Hu, K., and Zhu, J. Oriented reppoints for aerial object detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

4. Zheng, T., Huang, Y., Liu, Y., Tang, W., Yang, Z., Cai, D., and He, X. CLRNet: Cross Layer Refinement Network for Lane Detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition.

5. Chan, K.C., Zhou, S., Xu, X., and Loy, C.C. Investigating Tradeoffs in Real-World Video Super-Resolution. Proceedings of the PIEEE/CVF Conference on Computer Vision and Pattern Recognition.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A 40-nm SONOS Digital CIM Using Simplified LUT Multiplier and Continuous Sample-Hold Sense Amplifier for AI Edge Inference;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2023-12