Automated Backend Allocation for Multi-Model, On-Device AI Inference

Author:

Iyer Venkatraman1ORCID,Lee Sungho1ORCID,Lee Semun1ORCID,Kim Juitem Joonwoo1ORCID,Kim Hyunjun1ORCID,Shin Youngjae1ORCID

Affiliation:

1. Samsung Electronics, Seoul, Republic of Korea

Abstract

On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios.

Publisher

Association for Computing Machinery (ACM)

Reference4 articles.

1. 2013. OpenCL 2.0 specification. https://registry.khronos.org/OpenCL/specs/opencl-2.0.pdf. (2013).

2. 2020. ONE - On-device Neural Engine. https://github.com/Samsung/ONE. (2020).

3. Automated Backend Allocation for Multi-Model, On-Device AI Inference

4. SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3