Affiliation:
1. School of Computer Science and Engineering, Central South University, China
2. Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, China
3. School of Computer Science and Engineering, Central South University, China and Shenzhen Research Institute, Central South University, China
Abstract
Edge intelligence has emerged as a promising paradigm to accelerate DNN inference by model partitioning, which is particularly useful for intelligent scenarios that demand high accuracy and low latency. However, the dynamic nature of the edge environment and the diversity of end devices pose a significant challenge for DNN model partitioning strategies. Meanwhile, limited resources of the edge server make it difficult to manage resource allocation efficiently among multiple devices. In addition, most of the existing studies disregard the different service requirements of the DNN inference tasks, such as its high accuracy-sensitive or high latency-sensitive. To address these challenges, we propose a Multi-Compression Scale DNN Inference Acceleration (MCIA) based on cloud-edge-end collaboration. We model this problem as a mixed-integer multi-dimensional optimization problem, jointly optimizing the DNN model version choice, the partitioning choice, and the allocation of computational and bandwidth resources to maximize the tradeoff between inference accuracy and latency depending on the property of the tasks. Initially, we train multiple versions of DNN inference models with different compression scales in the cloud, and deploy them to end devices and edge server. Next, a deep reinforcement learning-based algorithm is developed for joint decision making of adaptive collaborative inference and resource allocation based on the current multi-compression scale models and the task property. Experimental results show that MCIA can adapt to heterogeneous devices and dynamic networks, and has superior performance compared with other methods.
Funder
National Natural Science Foundation of China
National Natural Science Foundation of Hunan Province
Opening Project of State Key Laboratory of Nickel and Cobalt Resources Comprehensive Utilization
Shenzhen Science and Technology Program
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Software
Reference43 articles.
1. [Online]. 2016. The GFLOPS/W of the various machines in the VMW Research Group. https://web.eece.maine.edu/vweaver/group/green_machines.html
2. DRL based partial offloading for maximizing sum computation rate of FDMA-based wireless powered mobile edge computing;Chen Wenchao;Computer Networks,2022
3. ThriftyEdge: Resource-Efficient Edge Computing for Intelligent IoT Applications
4. Intelligent delay-aware partial computing task offloading for multi-user industrial Internet of Things through edge computing;Deng Xiaoheng;IEEE Internet of Things Journal,2021
5. Deep reinforcement learning-based resource allocation for cloud gaming via edge computing;Deng Xiaoheng;IEEE Internet of Things Journal,2022
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献