Context-aware Adaptive Surgery

Author:

Wang Hongli1,Guo Bin1,Liu Jiaqi1,Liu Sicong1,Wu Yungang1,Yu Zhiwen1

Affiliation:

1. Northwestern Polytechnical University, Xi'an, Shaanxi, China

Abstract

Deep Neural Networks (DNNs) have made massive progress in many fields and deploying DNNs on end devices has become an emerging trend to make intelligence closer to users. However, it is challenging to deploy large-scale and computation-intensive DNNs on resource-constrained end devices due to their small size and lightweight. To this end, model partition, which aims to partition DNNs into multiple parts to realize the collaborative computing of multiple devices, has received extensive research attention. To find the optimal partition, most existing approaches need to run from scratch under given resource constraints. However, they ignore that resources of devices (e.g., storage, battery power), and performance requirements (e.g., inference latency), are often continuously changing, making the optimal partition solution change constantly during processing. Therefore, it is very important to reduce the tuning latency of model partition to realize the real-time adaption under the changing processing context. To address these problems, we propose the Context-aware Adaptive Surgery (CAS) framework to actively perceive the changing processing context, and adaptively find the appropriate partition solution in real-time. Specifically, we construct the partition state graph to comprehensively model different partition solutions of DNNs by import context resources. Then "the neighbor effect" is proposed, which provides the heuristic rule for the search process. When the processing context changes, CAS adopts the runtime search algorithm, Graph-based Adaptive DNN Surgery (GADS), to quickly find the appropriate partition that satisfies resource constraints under the guidance of the neighbor effect. The experimental results show that CAS realizes adaptively rapid tuning of the model partition solutions in 10ms scale even for large DNNs (2.25x to 221.7x search time improvement than the state-of-the-art researches), and the total inference latency still keeps the same level with baselines.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Human-Computer Interaction

Reference47 articles.

1. An effective deep autoencoder approach for online smartphone-based human activity recognition;Almaslukh Bandar;Int. J. Comput. Sci. Netw. Secur,2017

2. GestEar

3. Eyeriss

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget;IEEE Transactions on Mobile Computing;2024-09

2. AdaMEC: Towards a Context-adaptive and Dynamically Combinable DNN Deployment Framework for Mobile Edge Computing;ACM Transactions on Sensor Networks;2023-12-07

3. Content-Aware Adaptive Device–Cloud Collaborative Inference for Object Detection;IEEE Internet of Things Journal;2023-11-01

4. Adaptive Deep Inference Framework for Cloud-Edge Collaboration;2023 International Conference on Computers, Information Processing and Advanced Education (CIPAE);2023-08-26

5. Edge-Cloud Collaborated Object Detection via Difficult-Case Discriminator;2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS);2023-07

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3