Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks-Reference-Cited by-同舟云学术

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

Published:2017-07-21 Issue:3 Volume:10 Page:1-23
ISSN:1936-7406
Container-title:ACM Transactions on Reconfigurable Technology and Systems
language:en
Short-container-title:ACM Trans. Reconfigurable Technol. Syst.

Author:

Liu Zhiqiang¹,Dou Yong¹,Jiang Jingfei¹,Xu Jinwei¹,Li Shijie¹,Zhou Yongmei¹,Xu Yingnan¹

Affiliation:

1. National University of Defense Technology, Changsha, Hunan, China

Abstract

Deep convolutional neural networks (CNNs) have gained great success in various computer vision applications. State-of-the-art CNN models for large-scale applications are computation intensive and memory expensive and, hence, are mainly processed on high-performance processors like server CPUs and GPUs. However, there is an increasing demand of high-accuracy or real-time object detection tasks in large-scale clusters or embedded systems, which requires energy-efficient accelerators because of the green computation requirement or the limited battery restriction. Due to the advantages of energy efficiency and reconfigurability, Field-Programmable Gate Arrays (FPGAs) have been widely explored as CNN accelerators. In this article, we present an in-depth analysis of computation complexity and the memory footprint of each CNN layer type. Then a scalable parallel framework is proposed that exploits four levels of parallelism in hardware acceleration. We further put forward a systematic design space exploration methodology to search for the optimal solution that maximizes accelerator throughput under the FPGA constraints such as on-chip memory, computational resources, external memory bandwidth, and clock frequency. Finally, we demonstrate the methodology by optimizing three representative CNNs (LeNet, AlexNet, and VGG-S) on a Xilinx VC709 board. The average performance of the three accelerators is 424.7, 445.6, and 473.4GOP/s under 100MHz working frequency, which outperforms the CPU and previous work significantly.

Funder

National Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3079758

Reference20 articles.

1. Convolutional Neural Networks for Speech Recognition

2. Reconfigurable pipelined 2-D convolvers for fast digital signal processing

3. A dynamically configurable coprocessor for convolutional neural networks

4. DaDianNao: A Machine-Learning Supercomputer

Cited by 75 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Pflow: An end-to-end heterogeneous acceleration framework for CNN inference on FPGAs;Journal of Systems Architecture;2024-05

2. Designing Optimized and Secured Reusable Convolutional Hardware Accelerator Against IP Piracy Using Retina Biometrics;2023 IEEE International Symposium on Smart Electronic Systems (iSES);2023-12-18

3. A Multiplier-Free Convolution Neural Network Hardware Accelerator for Real-Time Bearing Condition Detection of CNC Machinery;Sensors;2023-11-27

4. Basic Knowledge of AI Technology;Intelligent Satellite Design and Implementation;2023-10-13

5. Implementation of Fully-Pipelined CNN Inference Accelerator on FPGA and HBM2 Platform;IEICE Transactions on Information and Systems;2023-06-01