AoCStream: All-on-Chip CNN Accelerator with Stream-Based Line-Buffer Architecture and Accelerator-Aware Pruning
Author:
Kang Hyeong-Ju1ORCID, Yang Byung-Do2
Affiliation:
1. School of Computer Science and Engineering, Korea University of Technology and Education, 1600 Chungjeol-ro, Dongnam-gu, Cheonan 31253, Republic of Korea 2. School of Electronics Engineering, Chungbuk National University, 1 Chungdae-ro, Seowon-gu, Cheongju 28644, Republic of Korea
Abstract
Convolutional neural networks (CNNs) play a crucial role in many EdgeAI and TinyML applications, but their implementation usually requires external memory, which degrades the feasibility of such resource-hungry environments. To solve this problem, this paper proposes memory-reduction methods at the algorithm and architecture level, implementing a reasonable-performance CNN with the on-chip memory of a practical device. At the algorithm level, accelerator-aware pruning is adopted to reduce the weight memory amount. For activation memory reduction, a stream-based line-buffer architecture is proposed. In the proposed architecture, each layer is implemented by a dedicated block, and the layer blocks operate in a pipelined way. Each block has a line buffer to store a few rows of input data instead of a frame buffer to store the whole feature map, reducing intermediate data-storage size. The experimental results show that the object-detection CNNs of MobileNetV1/V2 and an SSDLite variant, widely used in TinyML applications, can be implemented even on a low-end FPGA without external memory.
Funder
Ministry of Education
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference38 articles.
1. He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27–30). Deep residual learning for image recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 2. Howard, A.G., Zhu, M., Chen, B., Kalenichenki, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv. 3. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18–22). MobileNetV2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 4. Tan, M., and Le, Q.V. (2019, January 9–15). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA. 5. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., and Berg, A.C. (2016, January 11–14). SSD: Single shot multibox detector. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands.
|
|