An All-digital Compute-in-memory FPGA Architecture for Deep Learning Acceleration

Author:

Li Yonggen1ORCID,Li Xin1ORCID,Shen Haibin1ORCID,Fan Jicong2ORCID,Xu Yanfeng2ORCID,Huang Kejie1ORCID

Affiliation:

1. Zhejiang University, Hangzhou, China

2. China Electronics Technology Group Corporation 58th Research Institute, Wuxi, China

Abstract

Field Programmable Gate Array (FPGA) is a versatile and programmable hardware platform, which makes it a promising candidate for accelerating Deep Neural Networks (DNNs). However, FPGA’s computing energy efficiency is low due to the domination of energy consumption by interconnect data movement. In this article, we propose an all-digital Compute-in-memory FPGA architecture for deep learning acceleration. Furthermore, we present a bit-serial computing circuit of the Digital CIM core for accelerating vector-matrix multiplication (VMM) operations. A Network-CIM-deployer ( NCIMD ) is also developed to support automatic deployment and mapping of DNN networks. NCIMD provides a user-friendly API of DNN models in Caffe format. Meanwhile, we introduce a Weight-stationary dataflow and describe the method of mapping a single layer of the network to the CIM array in the architecture. We conduct experimental tests on the proposed FPGA architecture in the field of Deep Learning (DL), as well as in non-DL fields, using different architectural layouts and mapping strategies. We also compare the results with the conventional FPGA architecture. The experimental results show that compared to the conventional FPGA architecture, the energy efficiency can achieve a maximum speedup of 16.1×, while the latency can decrease up to 40% in our proposed CIM FPGA architecture.

Funder

National Natural Science Foundation of China

Sino-German Mobility Programme

Publisher

Association for Computing Machinery (ACM)

Reference46 articles.

1. IMAC: In-memory multi-bit multiplication and accumulation in 6T SRAM array;Ali Mustafa;IEEE Trans. Circ. Syst. I,2020

2. CoMeFa: Compute-in-Memory Blocks for FPGAs

3. CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning Acceleration

4. Koios: A Deep Learning Benchmark Suite for FPGA Architecture and CAD Research

5. A CNN accelerator on FPGA using depthwise separable convolution;Bai Lin;IEEE Trans. Circ. Syst. II,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3