A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator-Reference-Cited by-同舟云学术

A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator

Published:2023-03-27 Issue:7 Volume:12 Page:1571
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Huang Jiye¹²^ORCID,Liu Xin¹²^ORCID,Guo Tongdong¹^ORCID,Zhao Zhijin³^ORCID

Affiliation:

1. The School of Electronics and Information, Hangzhou Dianzi University, Hangzhou 310018, China

2. Zhejiang Provincial Key Lab of Equipment Electronics, Hangzhou 310018, China

3. The School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China

Abstract

Depthwise separable convolution (DSC) significantly reduces parameter and floating operations with an acceptable loss of accuracy and has been widely used in various lightweight convolutional neural network (CNN) models. In practical applications, however, DSC accelerators based on graphics processing units (GPUs) cannot fully exploit the performance of DSC and are unsuitable for mobile application scenarios. Moreover, low resource utilization due to idle engines is a common problem in DSC accelerator design. In this paper, a high-performance DSC hardware accelerator based on field-programmable gate arrays (FPGAs) is proposed. A highly reusable and scalable multiplication and accumulation engine is proposed to improve the utilization of computational resources. An efficient convolution algorithm is proposed for depthwise convolution (DWC) and pointwise convolution (PWC), respectively, to reduce the on-chip memory occupancy. Meanwhile, the proposed convolution algorithms achieve partial fusion between PWC and DWC, and improve the off-chip memory access efficiency. To maximise bandwidth utilization and reduce latency when reading feature maps, an address mapping method for off-chip accesses is proposed. The performance of the proposed accelerator is demonstrated by implementing MobileNetV2 on an Intel Arria 10 GX660 FPGA by using Verilog HDL. The experimental results show that the proposed DSC accelerator achieves a performance of 205.1 FPS, 128.8 GFLOPS, and 0.24 GOPS/DSP for input images of size 224×224×3.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/7/1571/pdf

Reference30 articles.

1. Chen, L., Li, S., Bai, Q., Yang, J., Jiang, S., and Miao, Y. (2021). Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens., 13.

2. Recent advances in the application of deep learning methods to forestry;Wang;Wood Sci. Technol.,2021

3. Guo, Z., Huang, Y., Hu, X., Wei, H., and Zhao, B. (2021). A Survey on Deep Learning Based Approaches for Scene Understanding in Autonomous Driving. Electronics, 10.

4. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.C. (2018, January 18–23). Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

5. Dynamic Dataflow Scheduling and Computation Mapping Techniques for Efficient Depthwise Separable Convolution Acceleration;Li;IEEE Trans. Circuits Syst.-Regul. Pap.,2021

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hardware Parallel Structure for Convolution Computing in Image Processing;2024 47th International Conference on Telecommunications and Signal Processing (TSP);2024-07-10

2. Efficient Two-Stage Max-Pooling Engines for an FPGA-Based Convolutional Neural Network;Electronics;2023-09-26