Multi-Compression Scale DNN Inference Acceleration based on Cloud-Edge-End Collaboration

Author:

Qi Huamei1ORCID,Ren Fang1ORCID,Wang Leilei1ORCID,Jiang Ping1ORCID,Wan Shaohua2ORCID,Deng Xiaoheng3ORCID

Affiliation:

1. School of Computer Science and Engineering, Central South University, China

2. Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, China

3. School of Computer Science and Engineering, Central South University, China and Shenzhen Research Institute, Central South University, China

Abstract

Edge intelligence has emerged as a promising paradigm to accelerate DNN inference by model partitioning, which is particularly useful for intelligent scenarios that demand high accuracy and low latency. However, the dynamic nature of the edge environment and the diversity of end devices pose a significant challenge for DNN model partitioning strategies. Meanwhile, limited resources of the edge server make it difficult to manage resource allocation efficiently among multiple devices. In addition, most of the existing studies disregard the different service requirements of the DNN inference tasks, such as its high accuracy-sensitive or high latency-sensitive. To address these challenges, we propose a Multi-Compression Scale DNN Inference Acceleration (MCIA) based on cloud-edge-end collaboration. We model this problem as a mixed-integer multi-dimensional optimization problem, jointly optimizing the DNN model version choice, the partitioning choice, and the allocation of computational and bandwidth resources to maximize the tradeoff between inference accuracy and latency depending on the property of the tasks. Initially, we train multiple versions of DNN inference models with different compression scales in the cloud, and deploy them to end devices and edge server. Next, a deep reinforcement learning-based algorithm is developed for joint decision making of adaptive collaborative inference and resource allocation based on the current multi-compression scale models and the task property. Experimental results show that MCIA can adapt to heterogeneous devices and dynamic networks, and has superior performance compared with other methods.

Funder

National Natural Science Foundation of China

National Natural Science Foundation of Hunan Province

Opening Project of State Key Laboratory of Nickel and Cobalt Resources Comprehensive Utilization

Shenzhen Science and Technology Program

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Reference43 articles.

1. [Online]. 2016. The GFLOPS/W of the various machines in the VMW Research Group. https://web.eece.maine.edu/vweaver/group/green_machines.html

2. DRL based partial offloading for maximizing sum computation rate of FDMA-based wireless powered mobile edge computing;Chen Wenchao;Computer Networks,2022

3. ThriftyEdge: Resource-Efficient Edge Computing for Intelligent IoT Applications

4. Intelligent delay-aware partial computing task offloading for multi-user industrial Internet of Things through edge computing;Deng Xiaoheng;IEEE Internet of Things Journal,2021

5. Deep reinforcement learning-based resource allocation for cloud gaming via edge computing;Deng Xiaoheng;IEEE Internet of Things Journal,2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3