A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator
-
Published:2023-03-27
Issue:7
Volume:12
Page:1571
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Huang Jiye12ORCID, Liu Xin12ORCID, Guo Tongdong1ORCID, Zhao Zhijin3ORCID
Affiliation:
1. The School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, China 2. Zhejiang Provincial Key Lab of Equipment Electronics, Hangzhou 310018, China 3. The School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China
Abstract
Depthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processing units (GPUs) cannot fully exploit the performance of DSC and are unsuitable for mobile application scenarios. Moreover, low resource utilization due to idle engines is a common problem in DSC accelerator design. In this paper, a high-performance DSC hardware accelerator based on field-programmable gate arrays (FPGAs) is proposed. A highly reusable and scalable multiplication and accumulation engine is proposed to improve the utilization of computational resources. An efficient convolution algorithm is proposed for depthwise convolution (DWC) and pointwise convolution (PWC), respectively, to reduce the on-chip memory occupancy. Meanwhile, the proposed convolution algorithms achieve partial fusion between PWC and DWC, and improve the off-chip memory access efficiency. To maximise bandwidth utilization and reduce latency when reading feature maps, an address mapping method for off-chip accesses is proposed. The performance of the proposed accelerator is demonstrated by implementing MobileNetV2 on an Intel Arria 10 GX660 FPGA by using Verilog HDL. The experimental results show that the proposed DSC accelerator achieves a performance of 205.1 FPS, 128.8 GFLOPS, and 0.24 GOPS/DSP for input images of size 224×224×3.
Funder
National Natural Science Foundation of China
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference30 articles.
1. Chen, L., Li, S., Bai, Q., Yang, J., Jiang, S., and Miao, Y. (2021). Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens., 13. 2. Recent advances in the application of deep learning methods to forestry;Wang;Wood Sci. Technol.,2021 3. Guo, Z., Huang, Y., Hu, X., Wei, H., and Zhao, B. (2021). A Survey on Deep Learning Based Approaches for Scene Understanding in Autonomous Driving. Electronics, 10. 4. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18–23). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 5. Dynamic Dataflow Scheduling and Computation Mapping Techniques for Efficient Depthwise Separable Convolution Acceleration;Li;IEEE Trans. Circuits Syst.-Regul. Pap.,2021
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|