Affiliation:
1. National University of Defense Technology, Changsha, Hunan, China
2. National University of Defense Technology, China
3. Sun Yat-sen University, China
4. University of Pittsburgh, USA
Abstract
PIM (processing-in-memory) based hardware accelerators have shown great potentials in addressing the computation and memory access intensity of modern CNNs (convolutional neural networks). While adopting NVM (non-volatile memory) helps to further mitigate the storage and energy consumption overhead, adopting quantization, e.g., shift-based quantization, helps to tradeoff the computation overhead and the accuracy loss, integrating both NVM and quantization in hardware accelerators leads to sub-optimal acceleration.
In this paper, we exploit the natural shift property of DWM (domain wall memory) to devise DWMAcc, a DWM-based accelerator with asymmetrical storage of weight and input data, to speed up the inference phase of shift-based CNNs. DWMAcc supports flexible shift operations to enable fast processing with low performance and area overhead. We then optimize it with
zero-sharing
,
input-reuse
, and
weight-share
schemes. Our experimental results show that, on average, DWMAcc achieves 16.6× performance improvement and 85.6× energy consumption reduction over a state-of-the-art SRAM based design.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Software
Reference28 articles.
1. 2016. NVIDIA TITAN X (pascal). http://www.geforce.com/hardware/10series/ titan-x-pasca. 2016. NVIDIA TITAN X (pascal). http://www.geforce.com/hardware/10series/ titan-x-pasca.
2. Boosting access parallelism to PCM-based main memory
3. DaDianNao: A Machine-Learning Supercomputer
4. Eyeriss
5. Architecture design with STT-RAM: Opportunities and challenges
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献