CNNFlow: Memory-driven Data Flow Optimization for Convolutional Neural Networks-Reference-Cited by-同舟云学术

CNNFlow: Memory-driven Data Flow Optimization for Convolutional Neural Networks

Published:2023-03-19 Issue:3 Volume:28 Page:1-36
ISSN:1084-4309
Container-title:ACM Transactions on Design Automation of Electronic Systems
language:en
Short-container-title:ACM Trans. Des. Autom. Electron. Syst.

Author:

Nie Qi¹^ORCID,Malik Sharad¹^ORCID

Affiliation:

1. Princeton University, Princeton, NJ

Abstract

Convolution Neural Networks (CNNs) are widely deployed in computer vision applications. The datasets are large, and the data reuse across different parts is heavily interleaved. Given that memory access (SRAM and especially DRAM) is more expensive in both performance and energy than computation, maximizing data reuse to reduce data movement across the memory hierarchy is critical to improving execution efficiency. This is even more important for the common use case of CNNs on mobile devices where computing/memory resources are limited. We propose CNNFlow, a memory-driven dataflow optimization framework to automatically schedule CNN computation on a given CNN architecture to maximize data reuse at each level of the memory hierarchy. We provide a mathematical calculation for data reuses in terms of parameters including loop ordering, blocking, and memory-bank allocation for tensors in CNN. We then present a series of techniques that help prune the large search space and reduce the cost of the exploration. This provides, for the first time, an exact and practical search algorithm for optimal solutions to minimize memory access cost for CNN. The efficacy is demonstrated for two widely used CNN algorithms: AlexNet and VGG16 with 5 and 13 convolution layers, respectively. CNNFlow finds the optimal solution for each layer within tens of minutes of compute time. Its solution requires about 20% fewer DRAM accesses and 40%–80% fewer SRAM accesses compared to state-of-the-art algorithms in the literature.

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications

Link

https://dl.acm.org/doi/pdf/10.1145/3577017

Reference51 articles.

1. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks

2. Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2014. DaDianNao: A machine-learning supercomputer. In MICRO. 609–622.

3. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding;Han Song;arXiv preprint arXiv:1510.00149,2015

4. Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In FPGA. 26–35.

5. Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE, 27–39.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exact Scheduling to Minimize Off-Chip Data Movement for Deep Learning Accelerators;2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC);2024-01-22