Abstract
Standard convolutional neural networks (CNNs) have large amounts of data redundancy, and the same accuracy can be obtained even in lower bit weights instead of floating-point representation. Most CNNs have to be developed and executed on high-end GPU-based workstations, for which it is hard to transplant the existing implementations onto portable edge FPGAs because of the limitation of on-chip block memory storage size and battery capacity. In this paper, we present adaptive pointwise convolution and 2D convolution joint network (AP2D-Net), an ultra-low power and relatively high throughput system combined with dynamic precision weights and activation. Our system has high performance, and we make a trade-off between accuracy and power efficiency by adopting unmanned aerial vehicle (UAV) object detection scenarios. We evaluate our system on the Zynq UltraScale+ MPSoC Ultra96 mobile FPGA platform. The target board can get the real-time speed of 30 fps under 5.6 W, and the FPGA on-chip power is only 0.6 W. The power efficiency of our system is 2.8× better than the best system design on a Jetson TX2 GPU and 1.9× better than the design on a PYNQ-Z1 SoC FPGA.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference42 articles.
1. CUDA Toolkit Documentation: Nvidia Developer Zone—CUDA C Programming Guide v8.0,2017
2. cudnn: Efficient primitives for deep learning;Chetlur;arXiv,2014
3. MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献