Affiliation:
1. Agency for Defense Development, Yuseong P.O. Box 35, Daejeon 34186, Republic of Korea
Abstract
This paper proposes two max-pooling engines, named the RTB-MAXP engine and the CMB-MAXP engine, with a scalable window size parameter for FPGA-based convolutional neural network (CNN) implementation. The max-pooling operation for the CNN can be decomposed into two stages, i.e., a horizontal axis max-pooling operation and a vertical axis max-pooling operation. These two one-dimensional max-pooling operations are performed by tracking the rank of the values within the window in the RTB-MAXP engine and cascading the maximum operations of the values in the CMB-MAXP engine. Both the RTB-MAXP engine and the CMB-MAXP engine were implemented using VHSIC hardware description language (VHDL) and verified by simulations. The implementation results demonstrate that the 16 CMB-MAXP engines achieved a remarkable throughput of about 9 GBPS (gigabytes per second) while utilizing only about 3% of the available resources on the Xilinx Virtex UltraScale+ FPGA XCVU9P. On the other hand, the 16 RTB-MAXP engines exhibited somewhat lower throughput and resource utilization, although they did offer a slightly better latency when compared to the CMB-MAXP engines. In the comparison with existing techniques, the CMB-MAXP engine exhibited comparable implementation results in terms of the resource utilization and maximum operating frequency. It is crucial to note that only the proposed engines provide the features of runtime window scalability and boundary padding capability, which are essential requirements for CNN accelerators. The proposed max-pooling engines were employed and tested in our CNN accelerator targeting the CNN model YOLOv4-CSP-S-Leaky for object detection.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference21 articles.
1. Object Detection with Deep Learning: A Review;Zhao;IEEE Trans. Neural Netw. Learn. Syst.,2019
2. Lee, D.-H. (2019). Fully Convolutional Single-Crop Siamese Networks for Real-Time Visual Object Tracking. Electronics, 8.
3. FPGA-based Accelerators of Deep Learning Networks for Learning and Classification: A Review;Shawahna;IEEE Access,2018
4. Huang, J., Liu, X., Guo, T., and Zhao, Z. (2023). A High-Performance FPGA-Based Depthwise Separable Convolution Accelerator. Electronics, 12.
5. Xie, Y., Majoros, T., and Oniga, S. (2022). FPGA-Based Hardware Accelerator on Portable Equipment for EEG Signal Patterns Recognition. Electronics, 11.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Fabric defect detection algorithm based on improved YOLOv8;Textile Research Journal;2024-07-25
2. Hardware Parallel Structure for Convolution Computing in Image Processing;2024 47th International Conference on Telecommunications and Signal Processing (TSP);2024-07-10