Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration-Reference-Cited by-同舟云学术

Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration

Published:2024-01-19 Issue:1 Volume:23 Page:1-25
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Qi Huamei¹^ORCID,Ren Fang¹^ORCID,Wang Leilei¹^ORCID,Jiang Ping¹^ORCID,Wan Shaohua²^ORCID,Deng Xiaoheng³^ORCID

Affiliation:

1. School of Computer Science and Engineering, Central South University, China

2. Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, China

3. School of Computer Science and Engineering, Central South University, China and Shenzhen Research Institute, Central South University, China

Abstract

Edge intelligence has emerged as a promising paradigm to accelerate DNN inference by model partitioning, which is particularly useful for intelligent scenarios that demand high accuracy and low latency. However, the dynamic nature of the edge environment and the diversity of end devices pose a significant challenge for DNN model partitioning strategies. Meanwhile, limited resources of the edge server make it difficult to manage resource allocation efficiently among multiple devices. In addition, most of the existing studies disregard the different service requirements of the DNN inference tasks, such as its high accuracy-sensitive or high latency-sensitive. To address these challenges, we propose a Multi-Compression Scale DNN Inference Acceleration (MCIA) based on cloud-edge-end collaboration. We model this problem as a mixed-integer multi-dimensional optimization problem, jointly optimizing the DNN model version choice, the partitioning choice, and the allocation of computational and bandwidth resources to maximize the tradeoff between inference accuracy and latency depending on the property of the tasks. Initially, we train multiple versions of DNN inference models with different compression scales in the cloud, and deploy them to end devices and edge server. Next, a deep reinforcement learning-based algorithm is developed for joint decision making of adaptive collaborative inference and resource allocation based on the current multi-compression scale models and the task property. Experimental results show that MCIA can adapt to heterogeneous devices and dynamic networks, and has superior performance compared with other methods.

Funder

National Natural Science Foundation of China

National Natural Science Foundation of Hunan Province

Opening Project of State Key Laboratory of Nickel and Cobalt Resources Comprehensive Utilization

Shenzhen Science and Technology Program

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3634704

Reference43 articles.

1. [Online]. 2016. The GFLOPS/W of the various machines in the VMW Research Group. https://web.eece.maine.edu/vweaver/group/green_machines.html

2. DRL based partial offloading for maximizing sum computation rate of FDMA-based wireless powered mobile edge computing;Chen Wenchao;Computer Networks,2022

3. ThriftyEdge: Resource-Efficient Edge Computing for Intelligent IoT Applications

4. Intelligent delay-aware partial computing task offloading for multi-user industrial Internet of Things through edge computing;Deng Xiaoheng;IEEE Internet of Things Journal,2021

5. Deep reinforcement learning-based resource allocation for cloud gaming via edge computing;Deng Xiaoheng;IEEE Internet of Things Journal,2022

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Incentive Mechanism Against Bounded Rationality for Federated Learning-Enabled Internet of UAVs: A Prospect Theory-Based Approach;IEEE Internet of Things Journal;2024-06-15

2. Energy Efficiency Maximization for UAV-Assisted Full-Duplex Communication in the Presence of Multiple Malicious Jammers;IEEE Systems Journal;2024-06

3. Decentralized and Fault-Tolerant Task Offloading for Enabling Network Edge Intelligence;IEEE Systems Journal;2024-06

4. Deep Reinforcement Learning-based Mining Task Offloading Scheme for Intelligent Connected Vehicles in UAV-aided MEC;ACM Transactions on Design Automation of Electronic Systems;2024-05-03

5. DNNSplit: Latency and Cost-Efficient Split Point Identification for Multi-Tier DNN Partitioning;IEEE Access;2024