Static Scheduling of Weight Programming for DNN Acceleration with Resource Constrained PIM

Author:

Gao Xin1,Wang Hongyue1,Chen Yiyan2,Zhang Yuhao2,Shen Zhaoyan2,Ju Lei1

Affiliation:

1. Shandong University/Quan Cheng Laboratory, China

2. Shandong University, China

Abstract

Most existing architectural studies on ReRAM-based processing-in-memory (PIM) DNN accelerators assume that all weights of the DNN can be mapped to the crossbar at once. However, these studies are over-idealized. ReRAM crossbar resources for calculation are limited because of technological limitations, so multiple weight mapping procedures are required during the inference process. In this paper, we propose a static scheduling framework which generates the mapping between DNN weights and ReRAM cells with minimum runtime weight programming cost. We first build a ReRAM crossbar programming latency model by simultaneously considering the DNN weight patterns, ReRAM programming operations, and PIM architecture characteristics. Then, the model is used in the searching process to obtain an optimized weight-to-OU mapping table with minimum online programming latency. Finally, an OU scheduler is used to coordinate the activation sequences of OUs in the crossbars to perform the inference computation correctly. Evaluation results show the proposed framework significantly reduces the weight programming overhead and the overall inference latency for various DNN models with different input data sets.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Reference45 articles.

1. PUMA

2. Wei-Hao Chen , Kai-Xiang Li , Wei-Yu Lin , Kuo-Hsiang Hsu , Pin-Yi Li , Cheng-Han Yang , Cheng-Xin Xue , En-Yu Yang , Yen-Kai Chen , Yun-Sheng Chang , et al . 2018 . A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In ISSCC’ 18. Wei-Hao Chen, Kai-Xiang Li, Wei-Yu Lin, Kuo-Hsiang Hsu, Pin-Yi Li, Cheng-Han Yang, Cheng-Xin Xue, En-Yu Yang, Yen-Kai Chen, Yun-Sheng Chang, et al. 2018. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In ISSCC’18.

3. Ping Chi , Shuangchen Li , Cong Xu , Tao Zhang , Jishen Zhao , Yongpan Liu , Yu Wang , and Yuan Xie . 2016 . Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ISCA’16 (2016). Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ISCA’16 (2016).

4. Chaoqun Chu Yanzhi Wang Yilong Zhao Xiaolong Ma Shaokai Ye Yunyan Hong Xiaoyao Liang Yinhe Han and Li Jiang. 2020. PIM-Prune: fine-grain DCNN pruning for crossbar-based process-in-memory architecture. In DAC’20. Chaoqun Chu Yanzhi Wang Yilong Zhao Xiaolong Ma Shaokai Ye Yunyan Hong Xiaoyao Liang Yinhe Han and Li Jiang. 2020. PIM-Prune: fine-grain DCNN pruning for crossbar-based process-in-memory architecture. In DAC’20.

5. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3