An OpenCL-Based FPGA Accelerator for Faster R-CNN-Reference-Cited by-同舟云学术

An OpenCL-Based FPGA Accelerator for Faster R-CNN

Published:2022-09-23 Issue:10 Volume:24 Page:1346
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

An Jianjing,Zhang Dezheng,Xu Ke,Wang Dong^ORCID

Abstract

In recent years, convolutional neural network (CNN)-based object detection algorithms have made breakthroughs, and much of the research corresponds to hardware accelerator designs. Although many previous works have proposed efficient FPGA designs for one-stage detectors such as Yolo, there are still few accelerator designs for faster regions with CNN features (Faster R-CNN) algorithms. Moreover, CNN’s inherently high computational complexity and high memory complexity bring challenges to the design of efficient accelerators. This paper proposes a software-hardware co-design scheme based on OpenCL to implement a Faster R-CNN object detection algorithm on FPGA. First, we design an efficient, deep pipelined FPGA hardware accelerator that can implement Faster R-CNN algorithms for different backbone networks. Then, an optimized hardware-aware software algorithm was proposed, including fixed-point quantization, layer fusion, and a multi-batch Regions of interest (RoIs) detector. Finally, we present an end-to-end design space exploration scheme to comprehensively evaluate the performance and resource utilization of the proposed accelerator. Experimental results show that the proposed design achieves a peak throughput of 846.9 GOP/s at the working frequency of 172 MHz. Compared with the state-of-the-art Faster R-CNN accelerator and the one-stage YOLO accelerator, our method achieves 10× and 2.1× inference throughput improvements, respectively.

Funder

NNSF of China Grant

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/10/1346/pdf

Reference32 articles.

1. ImageNet classification with deep convolutional neural networks

2. Very deep convolutional networks for large-scale image recognition;Simonyan;arXiv,2014

3. Deep residual learning for image recognition;He;Proceedings of the IEEE Conference on cOmputer Vision and Pattern Recognition,2016

4. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

5. Fast r-cnn;Girshick;Proceedings of the IEEE International Conference on Computer Vision,2015

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fast Inner-Product Algorithms and Architectures for Deep Neural Network Accelerators;IEEE Transactions on Computers;2024-02

2. Hardware Acceleration For Deep Learning Model;2023 International Conference on Microelectronics (ICM);2023-12-17

3. Degree-Aware Graph Neural Network Quantization;Entropy;2023-11-02

4. Boost Correlation Features with 3D-MiIoU-Based Camera-LiDAR Fusion for MODT in Autonomous Driving;Remote Sensing;2023-02-04