DWMAcc-Reference-Cited by-同舟云学术

DWMAcc

Published:2019-10-31 Issue:5s Volume:18 Page:1-19
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Chen Zhengguo¹^ORCID,Deng Quan²,Xiao Nong³,Pruhs Kirk⁴,Zhang Youtao⁴

Affiliation:

1. National University of Defense Technology, Changsha, Hunan, China

2. National University of Defense Technology, China

3. Sun Yat-sen University, China

4. University of Pittsburgh, USA

Abstract

PIM (processing-in-memory) based hardware accelerators have shown great potentials in addressing the computation and memory access intensity of modern CNNs (convolutional neural networks). While adopting NVM (non-volatile memory) helps to further mitigate the storage and energy consumption overhead, adopting quantization, e.g., shift-based quantization, helps to tradeoff the computation overhead and the accuracy loss, integrating both NVM and quantization in hardware accelerators leads to sub-optimal acceleration. In this paper, we exploit the natural shift property of DWM (domain wall memory) to devise DWMAcc, a DWM-based accelerator with asymmetrical storage of weight and input data, to speed up the inference phase of shift-based CNNs. DWMAcc supports flexible shift operations to enable fast processing with low performance and area overhead. We then optimize it with zero-sharing , input-reuse , and weight-share schemes. Our experimental results show that, on average, DWMAcc achieves 16.6× performance improvement and 85.6× energy consumption reduction over a state-of-the-art SRAM based design.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3358199

Reference28 articles.

1. 2016. NVIDIA TITAN X (pascal). http://www.geforce.com/hardware/10series/ titan-x-pasca. 2016. NVIDIA TITAN X (pascal). http://www.geforce.com/hardware/10series/ titan-x-pasca.

2. Boosting access parallelism to PCM-based main memory

3. DaDianNao: A Machine-Learning Supercomputer

4. Eyeriss

5. Architecture design with STT-RAM: Opportunities and challenges

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems;Journal of Systems Architecture;2022-07

2. Fast-track cache;Proceedings of the 36th ACM International Conference on Supercomputing;2022-06-28

3. An Automatic-Addressing Architecture With Fully Serialized Access in Racetrack Memory for Energy-Efficient CNNs;IEEE Transactions on Computers;2022-01-01