Static Scheduling of Weight Programming for DNN Acceleration with Resource Constrained PIM-Reference-Cited by-同舟云学术

Static Scheduling of Weight Programming for DNN Acceleration with Resource Constrained PIM

Published:2023-08-11 Issue: Volume: Page:
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Gao Xin¹,Wang Hongyue¹,Chen Yiyan²,Zhang Yuhao²,Shen Zhaoyan²,Ju Lei¹

Affiliation:

1. Shandong University/Quan Cheng Laboratory, China

2. Shandong University, China

Abstract

Most existing architectural studies on ReRAM-based processing-in-memory (PIM) DNN accelerators assume that all weights of the DNN can be mapped to the crossbar at once. However, these studies are over-idealized. ReRAM crossbar resources for calculation are limited because of technological limitations, so multiple weight mapping procedures are required during the inference process. In this paper, we propose a static scheduling framework which generates the mapping between DNN weights and ReRAM cells with minimum runtime weight programming cost. We first build a ReRAM crossbar programming latency model by simultaneously considering the DNN weight patterns, ReRAM programming operations, and PIM architecture characteristics. Then, the model is used in the searching process to obtain an optimized weight-to-OU mapping table with minimum online programming latency. Finally, an OU scheduler is used to coordinate the activation sequences of OUs in the crossbars to perform the inference computation correctly. Evaluation results show the proposed framework significantly reduces the weight programming overhead and the overall inference latency for various DNN models with different input data sets.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3615657

Reference45 articles.

1. PUMA

2. Wei-Hao Chen , Kai-Xiang Li , Wei-Yu Lin , Kuo-Hsiang Hsu , Pin-Yi Li , Cheng-Han Yang , Cheng-Xin Xue , En-Yu Yang , Yen-Kai Chen , Yun-Sheng Chang , et al . 2018 . A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In ISSCC’ 18. Wei-Hao Chen, Kai-Xiang Li, Wei-Yu Lin, Kuo-Hsiang Hsu, Pin-Yi Li, Cheng-Han Yang, Cheng-Xin Xue, En-Yu Yang, Yen-Kai Chen, Yun-Sheng Chang, et al. 2018. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors. In ISSCC’18.

3. Ping Chi , Shuangchen Li , Cong Xu , Tao Zhang , Jishen Zhao , Yongpan Liu , Yu Wang , and Yuan Xie . 2016 . Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ISCA’16 (2016). Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. ISCA’16 (2016).

4. Chaoqun Chu Yanzhi Wang Yilong Zhao Xiaolong Ma Shaokai Ye Yunyan Hong Xiaoyao Liang Yinhe Han and Li Jiang. 2020. PIM-Prune: fine-grain DCNN pruning for crossbar-based process-in-memory architecture. In DAC’20. Chaoqun Chu Yanzhi Wang Yilong Zhao Xiaolong Ma Shaokai Ye Yunyan Hong Xiaoyao Liang Yinhe Han and Li Jiang. 2020. PIM-Prune: fine-grain DCNN pruning for crossbar-based process-in-memory architecture. In DAC’20.

5. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).