DDAM: D ata D istribution- A ware M apping of CNNs on Processing-In-Memory Systems-Reference-Cited by-同舟云学术

DDAM: D ata D istribution- A ware M apping of CNNs on Processing-In-Memory Systems

Published:2023-03-19 Issue:3 Volume:28 Page:1-30
ISSN:1084-4309
Container-title:ACM Transactions on Design Automation of Electronic Systems
language:en
Short-container-title:ACM Trans. Des. Autom. Electron. Syst.

Author:

Wang Junpeng¹^ORCID,Du Haitao¹^ORCID,Ding Bo¹^ORCID,Xu Qi¹^ORCID,Chen Song²^ORCID,Kang Yi²^ORCID

Affiliation:

1. University of Science and Technology of China, Hefei, Anhui, P.R. China

2. University of Science and Technology of China; Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, Hefei, Anhui, P.R. China

Abstract

Convolution neural networks (CNNs) are widely used algorithms in image processing, natural language processing and many other fields. The large amount of memory access of CNNs is one of the major concerns in CNN accelerator designs that influences the performance and energy-efficiency. With fast and low-cost memory access, Processing-In-Memory (PIM) system is a feasible solution to alleviate the memory concern of CNNs. However, the distributed manner of data storing in PIM systems is in conflict with the large amount of data reuse of CNN layers. Nodes of PIM systems may need to share their data with each other before processing a CNN layer, leading to extra communication overhead. In this article, we propose DDAM to map CNNs onto PIM systems with the communication overhead reduced. Firstly, A data transfer strategy is proposed to deal with the data sharing requirement among PIM nodes by formulating a Traveling-Salesman-Problem (TSP). To improve data locality, a dynamic programming algorithm is proposed to partition the CNN and allocate a number of nodes to each part. Finally, an integer linear programming (ILP)-based mapping algorithm is proposed to map the partitioned CNN onto the PIM system. Experimental results show that compared to the baselines, DDAM can get a higher throughput of 2.0× with the energy cost reduced by 37% on average.

Funder

National Key R&D Program of China

National Natural Science Foundation of China

CAS Project for Young Scientists in Basic Research

Strategic Priority Research Program of Chinese Academy of Sciences

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications

Link

https://dl.acm.org/doi/pdf/10.1145/3576196

Reference45 articles.

1. 2018. Hybrid memory cube – HMC Gen2. (2018) 105. Retrieved from https://www.micron.com/-/media/client/global/documents/products/data-sheet/hmc/gen2/hmc_gen2.pdf. Accessed May 1 2022.

2. Fused-layer CNN accelerators

3. Irwan Bello William Fedus Xianzhi Du Ekin D. Cubuk Aravind Srinivas Tsung-Yi Lin Jonathon Shlens and Barret Zoph. 2021. Revisiting ResNets: Improved training and scaling strategies. arXiv:2103.07579. Retrieved from https://arxiv.org/abs/2103.07579.

4. Communication Lower Bound in Convolution Accelerators

5. DaDianNao: A Machine-Learning Supercomputer

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Load Balanced PIM-Based Graph Processing;ACM Transactions on Design Automation of Electronic Systems;2024-06-21

2. ILP-based Multi-Branch CNNs Mapping on Processing-in-Memory Architecture;2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS);2024-04-22

3. PIM-trie: A Skew-resistant Trie for Processing-in-Memory;Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures;2023-06-17

4. NicePIM: Design Space Exploration for Processing-In-Memory DNN Accelerators With 3D-Stacked-DRAM;IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems;2023