Optimus: An Operator Fusion Framework for Deep Neural Networks-Reference-Cited by-同舟云学术

Optimus: An Operator Fusion Framework for Deep Neural Networks

Published:2022-10-29 Issue:1 Volume:22 Page:1-26
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Cai Xuyi¹,Wang Ying²,Zhang Lei³

Affiliation:

1. Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing, China

2. Zhejiang Lab; State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

3. Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

Abstract

The reduction of neural parameters and operations for the applications on embedded and IoT platforms in current deep neural network (DNN) architectures has received increasing attention. Relatively, the intermediate feature maps of such lightweight neural networks begin to grow and usually outsize the on-chip memory as the new bottleneck, which introduces considerable power-consuming off-chip memory accesses. To reduce the feature-induced memory accesses, operator fusion has been proposed to parallelize the execution of multiple convolutional layers and shown significant reduction of off-chip memory accesses. However, how to fuse the neural operators is still a challenging issue that heavily depends on both the neural network (NN) topology and the specific DNN accelerator configuration. In this work, we observed prior operator fusion approaches fail to guarantee memory-level optimality as they search in the constrained operator fusion design space. Considering the complexity of the NN topologies and the constrained resources of the DNN accelerators, we develop a novel operator fusion framework, Optimus. Optimus includes an accurate memory cost model dedicated to the scheduler to evaluate the potential operator-fusion schemes and a directed acyclic graph-based operator fusion algorithm for both off-line and on-line workload deployment scenarios, which altogether generates high-efficiency operator-fusion solutions for arbitrary network models running on DNN accelerators. The experimental results show that Optimus reduces 17–75% off-chip memory accesses and obtains 1.86×–3.66× energy efficiency on state-of-the-art DNN workloads when compared to the baselines and brings significant power-efficiency boost to the DNN accelerators of different architectures and dataflows.

Funder

National Natural Science Foundation of China

Strategic Priority Research Program of Chinese Academy of Science

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3520142

Reference53 articles.

1. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16). 265–283.

2. Fused-layer CNN accelerators

3. CACTI 7

4. On optimizing operator fusion plans for large-scale machine learning in systemml;Boehm Matthias;arXiv:1801.00829,2018

5. Optimus: towards optimal layer-fusion on deep learning processors

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ML-Fusion: Determining Memory Levels for Data Reuse Between DNN Layers;Proceedings of the Great Lakes Symposium on VLSI 2024;2024-06-12

2. CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2;2024-04-27

3. DeepFrack: A Comprehensive Framework for Layer Fusion, Face Tiling, and Efficient Mapping in DNN Hardware Accelerators;2024 Design, Automation & Test in Europe Conference & Exhibition (DATE);2024-03-25

4. YFlows: Systematic Dataflow Exploration and Code Generation for Efficient Neural Network Inference using SIMD Architectures on CPUs;Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction;2024-02-17

5. Operator Fusion Scheduling Optimization for TVM Deep Learning Compilers;2023 3rd International Symposium on Computer Technology and Information Science (ISCTIS);2023-07-07