Abstract
The convolutional neural network model is an important algorithm for deep learning, and YOLOv3-tiny based on this model has excellent object detection ability. However, the computational power required by the model is still large, and it is difficult to realize the application in the embedded field. This paper proposes a hardware acceleration method for YOLOv3-tiny and implements it on FPGA platform. Firstly, the fixed-point quantitative processing was carried out for the network, and an appropriate fixed-point strategy was designed with the data accuracy as the index. Secondly, the parallel computing design and pipeline optimization principle were carried out, and the FIFO structure was introduced to shorten the running time. Finally, the experiment was carried out on the Xilinx PYNQ-Z2 platform, and the data were compared with the previous related work.
Publisher
Darcy & Roy Press Co. Ltd.