Affiliation:
1. School of Microelectronics, Xi’an Jiaotong University, No. 28, Xianning West Road, Beilin District, Xi’an, Shaanxi 710049, P. R. China
Abstract
The increasingly complicated and versatile convolutional neural networks (CNNs) models bring challenges to hardware acceleration in terms of performance, energy efficiency and flexibility. This paper proposes a reconfigurable neural accelerator (RNA) for flexible and efficient CNN acceleration. To provide hardware flexibility, RNA employs dynamically reconfigurable computing framework to rapidly configure data path between processing elements (PE) at run-time, as well as an interlaced data access mechanism for multi-bank RAM. To achieve high energy efficiency, three optimization mechanisms, including image row broadcasting dataflow (IRBD), tile-by-tile computing (TTC), and zero detection technology (ZDT), are dedicatedly designed for RNA to exploit data reuse and decrease memory bandwidth requirement, which is the key to improving performance and saving power consumption. To save hardware overhead, an online dynamic adaptive data truncation (DADT) mechanism is designed to compensate accuracy loss of multiplication results so that the computational precision in RNA can be reduced from 16-bit to 8-bit, which contributes to reducing the area of data path. The RNA architecture is implemented on Xilinx XC7Z100 FPGA and works at 250[Formula: see text]MHz. Experimental results show that the performance of running LeNet, AlexNet and VGG are 500 GOPS, 598 GOPS and 660 GOPS, respectively. Compared to previous FPGA-based designs, RNA achieves [Formula: see text] performance speedup and [Formula: see text] improvements on energy efficiency.
Funder
National Natural Science Foundation of China
Publisher
World Scientific Pub Co Pte Ltd
Subject
Electrical and Electronic Engineering,Hardware and Architecture,Electrical and Electronic Engineering,Hardware and Architecture