SLEXNet: Adaptive Inference Using Slimmable Early Exit Neural Networks-Reference-Cited by-同舟云学术

SLEXNet: Adaptive Inference Using Slimmable Early Exit Neural Networks

Published:2024-09-11 Issue:6 Volume:23 Page:1-29
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Kutukcu Basar¹^ORCID,Baidya Sabur²^ORCID,Dey Sujit¹^ORCID

Affiliation:

1. Electrical and Computer Engineering, University of California San Diego, La Jolla, United States

2. Computer Science and Engineering, University of Louisville, Louisville, United States

Abstract

Deep learning is a proven method in many applications. However, it requires high computation resources and usually has a constant architecture. Mobile systems are good candidates to benefit from deep learning applications since they are closely integrated in people’s life. However, mobile systems experience varying conditions for the same reason. Constant deep learning architectures against varying resources cannot satisfy the requirements of the applications, so dynamic deep learning architectures are needed. In this work, we propose SLEXNet, a slimmable early exit neural network architecture. SLEXNet combines dynamic depth and width architectures to adapt to varying time and power conditions. Moreover, we propose a runtime scheduling algorithm that can estimate inference time and power consumption of SLEXNet variations on runtime. We train SLEXNet on real aerial drone images and implement the runtime on NVIDIA Jetson Orin. We show that our approach achieves significantly better responses to time and power requirements in varying conditions than baseline dynamic depth and width techniques in a wide range of experiments.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3689632

Reference43 articles.

1. Ron Banner Yury Nahshan and Daniel Soudry. 2019. Post training 4-bit quantization of convolutional networks for rapid-deployment. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 7950–7958. https://proceedings.neurips.cc/paper/2019/hash/c0a62e133894cdce435bcb4a5df1db2d-Abstract.html

2. Ali Ehteshami Bejnordi and Ralf Krestel. 2020. Dynamic channel and layer gating in convolutional neural networks. In KI 2020: Advances in Artificial Intelligence. Lecture Notes in Computer Science Vol. 12325. Springer 33–45. 10.1007/978-3-030-58285-2_3

3. Davis W. Blalock Jose Javier Gonzalez Ortiz Jonathan Frankle and John V. Guttag. 2020. What is the state of neural network pruning? In Proceedings of the Conference on Machine Learning and Systems (MLSys’20).

4. Tolga Bolukbasi Joseph Wang Ofer Dekel and Venkatesh Saligrama. 2017. Adaptive neural networks for efficient inference. In Proceedings of the 34th International Conference on Machine Learning. 527–536. http://proceedings.mlr.press/v70/bolukbasi17a.html

5. Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, and Song Han. 2020. Once-for-all: Train one network and specialize it for efficient deployment. In Proceedings of the 8th International Conference on Learning Representations (ICLR’20). https://openreview.net/forum?id=HylxE1HKwS