Affiliation:
1. Nanjing University, China
Abstract
Convolutional neural networks (CNNs) have revolutionized computer vision applications with recent advancements. Extensive research focuses on optimizing CNNs for efficient deployment on resource-limited devices. However, the previous studies had several weaknesses, including limited support for diverse CNN structures, fixed scheduling strategies, overlapped computations, and high synchronization overheads. In this chapter, the authors introduce DeepSlicing, an adaptive inference system that addresses the above challenges. It supports various CNNs and offers flexible fine-grained scheduling, including GoogLeNet and ResNet models. DeepSlicing incorporates a proportional synchronized scheduler (PSS) for balancing computation and synchronization. Implemented using PyTorch, the authors evaluate DeepSlicing on an edge testbed of 8 heterogeneous Raspberry Pis. Results showcase the remarkable reductions in inference latency (up to 5.79 times) and memory footprint (up to 14.72 times), demonstrating the efficacy of this proposed approach.
Reference34 articles.
1. Crowley, E. J., Turner, J., Storkey, A., & O’Boyle, M. (2018). A closer look at structured pruning for neural network compression. arXiv preprint arXiv:1810.04622.
2. Dey, S., Mondal, J., & Mukherjee, A. (2019, March). Offloaded execution of deep learning inference at edge: Challenges and insights. In 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops) (pp. 855-861). IEEE.
3. Hadidi, R., Cao, J., Ryoo, M. S., & Kim, H. (2019). Collaborative execution of deep neural networks on internet of things devices. arXiv preprint arXiv:1901.02537.
4. Federated scheduling for Typed DAG tasks scheduling analysis on heterogeneous multi-cores
5. Learning both weights and connections for efficient neural network.;S.Han;Advances in Neural Information Processing Systems,2015