Eyeriss-Reference-Cited by-同舟云学术

Eyeriss

Published:2016-10-12 Issue:3 Volume:44 Page:367-379
ISSN:0163-5964
Container-title:ACM SIGARCH Computer Architecture News
language:en
Short-container-title:SIGARCH Comput. Archit. News

Author:

Chen Yu-Hsin¹,Emer Joel²,Sze Vivienne¹

Affiliation:

1. EECS, MIT

2. NVIDIA Research, NVIDIA

Abstract

Deep convolutional neural networks (CNNs) are widely used in modern AI systems for their superior accuracy but at the cost of high computational complexity. The complexity comes from the need to simultaneously process hundreds of filters and channels in the high-dimensional convolutions, which involve a significant amount of data movement. Although highly-parallel compute paradigms, such as SIMD/SIMT, effectively address the computation requirement to achieve high throughput, energy consumption still remains high as data movement can be more expensive than computation. Accordingly, finding a dataflow that supports parallel processing with minimal data movement cost is crucial to achieving energy-efficient CNN processing without compromising accuracy. In this paper, we present a novel dataflow, called row-stationary (RS), that minimizes data movement energy consumption on a spatial architecture. This is realized by exploiting local data reuse of filter weights and feature map pixels, i.e., activations, in the high-dimensional convolutions, and minimizing data movement of partial sum accumulations. Unlike dataflows used in existing designs, which only reduce certain types of data movement, the proposed RS dataflow can adapt to different CNN shape configurations and reduces all types of data movement through maximally utilizing the processing engine (PE) local storage, direct inter-PE communication and spatial parallelism. To evaluate the energy efficiency of the different dataflows, we propose an analysis framework that compares energy cost under the same hardware area and processing parallelism constraints. Experiments using the CNN configurations of AlexNet show that the proposed RS dataflow is more energy efficient than existing dataflows in both convolutional (1.4× to 2.5×) and fully-connected layers (at least 1.3× for batch size larger than 16). The RS dataflow has also been demonstrated on a fabricated chip, which verifies our energy analysis.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3007787.3001177

Reference43 articles.

1. Y. LeCun Y. Bengio and G. Hinton "Deep learning " Nature vol. 521 no. 7553 2015. Y. LeCun Y. Bengio and G. Hinton "Deep learning " Nature vol. 521 no. 7553 2015.

2. A. Krizhevsky I. Sutskever and G. E. Hinton "ImageNet Classification with Deep Convolutional Neural Networks " in NIPS 2012. A. Krizhevsky I. Sutskever and G. E. Hinton "ImageNet Classification with Deep Convolutional Neural Networks " in NIPS 2012.

3. Very Deep Convolutional Networks for Large-Scale Image Recognition;Simonyan K.;CoRR,2014

4. C. Szegedy W. Liu Y. Jia P. Sermanet S. Reed D. Anguelov D. Erhan V. Vanhoucke and A. Rabinovich "Going Deeper With Convolutions " in IEEE CVPR 2015. C. Szegedy W. Liu Y. Jia P. Sermanet S. Reed D. Anguelov D. Erhan V. Vanhoucke and A. Rabinovich "Going Deeper With Convolutions " in IEEE CVPR 2015.

5. K. He X. Zhang S. Ren and J. Sun "Deep Residual Learning for Image Recognition " in IEEE CVPR 2016. K. He X. Zhang S. Ren and J. Sun "Deep Residual Learning for Image Recognition " in IEEE CVPR 2016.

Cited by 426 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploiting Temporal-Unrolled Parallelism for Energy-Efficient SNN Acceleration;IEEE Transactions on Parallel and Distributed Systems;2024-10

2. ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error Detection;ACM Transactions on Architecture and Code Optimization;2024-09-14

3. Scratchpad Memory Management for Deep Learning Accelerators;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12

4. M‐DFCPP: A runtime library for multi‐machine dataflow computing;Concurrency and Computation: Practice and Experience;2024-08-07

5. Systolic Array Acceleration of Spiking Neural Networks with Application-Independent Split-Time Temporal Coding;Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design;2024-08-05