FPGA Implementation of a Deep Learning Acceleration Core Architecture for Image Target Detection-Reference-Cited by-同舟云学术

FPGA Implementation of a Deep Learning Acceleration Core Architecture for Image Target Detection

Published:2023-03-24 Issue:7 Volume:13 Page:4144
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Yang Xu¹^ORCID,Zhuang Chen²^ORCID,Feng Wenquan¹,Yang Zhe¹,Wang Qiang¹

Affiliation:

1. School of Electronic & Information Engineering, Beihang University, Beijing 100080, China

2. Hefei Innovation Research Institute of Beihang University, Hefei 230012, China

Abstract

Due to the flexibility and ease of deployment of Field Programmable Gate Arrays (FPGA), more and more studies have been conducted on developing and optimizing target detection algorithms based on Convolutional Neural Networks (CNN) models using FPGAs. Still, these studies focus on improving the performance of the core algorithm and optimizing hardware structure, with few studies focusing on the unified architecture design and corresponding optimization techniques for the algorithm model, resulting in inefficient overall model performance. The essential reason is that these studies do not address arithmetic power, speed, and resource consistency. In order to solve this problem, we propose a deep learning acceleration core architecture based on FPGAs, which is designed for target detection algorithms with CNN models, using multi-channel parallelization of CNN network models to improve the arithmetic power, using scheduling tasks and intensive computation pipelining to meet the algorithm’s data bandwidth requirements and unifying the speed and area of the orchestrated computation matrix to save hardware resources. The proposed framework achieves 14 Frames Per Second (FPS) inference performance of the TinyYolo model at 5 Giga Operations Per Second (GOPS) with 30% higher running clock frequency, 2–4 times higher arithmetic power, and 28% higher Digital Signal Processing (DSP) resource utilization efficiency using less than 25% of FPGA resource usage.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/7/4144/pdf

Reference35 articles.

1. Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. (2016, January 27–30). You only look once: Unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.

2. Faster r-cnn: Towards real-time object detection with region proposal networks;Ren;Adv. Neural Inf. Process. Syst.,2015

3. Sun, B., Wang, X., Oad, A., Pervez, A., and Dong, F. (2023). Automatic Ship Object Detection Model Based on YOLOv4 with Transformer Mechanism in Remote Sensing Images. Appl. Sci., 13.

4. Sun, Z., Leng, X., Lei, Y., Xiong, B., Ji, K., and Kuang, G. (2021). BiFA-YOLO: A novel YOLO-based method for arbitrary-oriented ship detection in high-resolution SAR images. Remote Sens., 13.

5. Hu, J., Zhi, X., Shi, T., Zhang, W., Cui, Y., and Zhao, S. (2021). PAG-YOLO: A portable attention-guided YOLO network for small ship detection. Remote Sens., 13.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fall Detection of the Elderly Using Denoising LSTM-Based Convolutional Variant Autoencoder;IEEE Sensors Journal;2024-06-01

2. Hardware Acceleration for Object Detection using YOLOv5 Deep Learning Algorithm on Xilinx Zynq FPGA Platform;Engineering, Technology & Applied Science Research;2024-02-08

3. Review of Energy-Efficient Embedded System Acceleration of Convolution Neural Networks for Organic Weeding Robots;Agriculture;2023-11-06

4. Structural-Parametric Synthesis of the Geometric Computer Interface;Proceedings of the 33rd International Conference on Computer Graphics and Vision;2023