FPQNet: Fully Pipelined and Quantized CNN for Ultra-Low Latency Image Classification on FPGAs Using OpenCAPI-Reference-Cited by-同舟云学术

FPQNet: Fully Pipelined and Quantized CNN for Ultra-Low Latency Image Classification on FPGAs Using OpenCAPI

Published:2023-09-29 Issue:19 Volume:12 Page:4085
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Ji Mengfei¹²^ORCID,Al-Ars Zaid²,Hofstee Peter²,Chang Yuchun³,Zhang Baolin¹

Affiliation:

1. State Key Laboratory on Integrated Optoelectronics, College of Electronic Science & Engineering, Jilin University, Changchun 130012, China

2. The Department of Quantum & Computer Engineering, Delft University of Technology, 2628 CD Delft, The Netherlands

3. School of Microelectronics, Dalian University of Technology, Dalian 116620, China

Abstract

Convolutional neural networks (CNNs) are to be effective in many application domains, especially in the computer vision area. In order to achieve lower latency CNN processing, and reduce power consumption, developers are experimenting with using FPGAs to accelerate CNN processing in several applications. Current FPGA CNN accelerators usually use the same acceleration approaches as GPUs, where operations from different network layers are mapped to the same hardware units working in a multiplexed manner. This will result in high flexibility in implementing different types of CNNs; however, this will degrade the latency that accelerators can achieve. Alternatively, we can reduce the latency of the accelerator by pipelining the processing of consecutive layers, at the expense of more FPGA resources. The continued increase in hardware resources available in FPGAs makes such implementations feasible for latency-critical application domains. In this paper, we present FPQNet, a fully pipelined and quantized CNN FPGA implementation that is channel-parallel, layer-pipelined, and network-parallel, to decrease latency and increase throughput, combined with quantization methods to optimize hardware utilization. In addition, we optimize this hardware architecture for the HDMI timing standard to avoid extra hardware utilization. This makes it possible for the accelerator to handle video datasets. We present prototypes of the FPQNet CNN network implementations on an Alpha Data 9H7 FPGA, connected with an OpenCAPI interface, to demonstrate architecture capabilities. Results show that with a 250 MHz clock frequency, an optimized LeNet-5 design is able to achieve latencies as low as 9.32 µs with an accuracy of 98.8% on the MNIST dataset, making it feasible for utilization in high frame rate video processing applications. With 10 hardware kernels working concurrently, the throughput is as high as 1108 GOPs. The methods in this paper are suitable for many other CNNs. Our analysis shows that the latency of AlexNet, ZFNet, OverFeat-Fast, and OverFeat-Accurate can be as low as 69.27, 66.95, 182.98, and 132.6 µs, using the architecture introduced in this paper, respectively.

Funder

Innovation Team Support Plan of Dalian

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/19/4085/pdf

Reference48 articles.

1. Recognizing Very Small Face Images Using Convolution Neural Networks;Horng;IEEE Trans. Intell. Transp. Syst.,2022

2. IoT enabled depthwise separable convolution neural network with deep support vector machine for COVID-19 diagnosis and classification;Le;Int. J. Mach. Learn. Cybern.,2021

3. Sharifrazi, D., Alizadehsani, R., Roshanzamir, M., Joloudari, J.H., Shoeibi, A., Jafari, M., Hussain, S., Sani, Z.A., Hasanzadeh, F., and Khozeimeh, F. (2021). Fusion of convolution neural network, support vector machine and Sobel filter for accurate detection of COVID-19 patients using X-ray images. Biomed. Signal Process. Control, 68.

4. Res2Net: A New Multi-Scale Backbone Architecture;Gao;IEEE Trans. Pattern Anal. Mach. Intell.,2021

5. Railway Traffic Object Detection Using Differential Feature Fusion Convolution Neural Network;Ye;IEEE Trans. Intell. Transp. Syst.,2021

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hardware Implementations of a Deep Learning Approach to Optimal Configuration of Reconfigurable Intelligence Surfaces;Sensors;2024-01-30