Automated Backend Allocation for Multi-Model, On-Device AI Inference-Reference-Cited by-同舟云学术

Automated Backend Allocation for Multi-Model, On-Device AI Inference

Published:2024-06-11 Issue:1 Volume:52 Page:27-28
ISSN:0163-5999
Container-title:ACM SIGMETRICS Performance Evaluation Review
language:en
Short-container-title:SIGMETRICS Perform. Eval. Rev.

Author:

Iyer Venkatraman¹^ORCID,Lee Sungho¹^ORCID,Lee Semun¹^ORCID,Kim Juitem Joonwoo¹^ORCID,Kim Hyunjun¹^ORCID,Shin Youngjae¹^ORCID

Affiliation:

1. Samsung Electronics, Seoul, Republic of Korea

Abstract

On-Device Artificial Intelligence (AI) services such as face recognition, object tracking and voice recognition are rapidly scaling up deployments on embedded, memory-constrained hardware devices. These services typically delegate AI inference models for execution on CPU and GPU computing backends. While GPU delegation is a common practice to achieve high speed computation, the approach suffers from degraded throughput and completion times under multi-model scenarios, i.e., concurrently executing services. This paper introduces a solution to sustain performance in multi-model, on-device AI contexts by dynamically allocating a combination of CPU and GPU backends per model. The allocation is feedback-driven, and guided by a knowledge of model-specific, multi-objective pareto fronts comprising inference latency and memory consumption. Our backend allocation algorithm that runs online per model, and achieves 25-100% improvement in throughput over static allocations as well as load-balancing scheduler solutions targeting multi-model scenarios.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3673660.3655046

Reference4 articles.

1. 2013. OpenCL 2.0 specification. https://registry.khronos.org/OpenCL/specs/opencl-2.0.pdf. (2013).

2. 2020. ONE - On-device Neural Engine. https://github.com/Samsung/ONE. (2020).

3. Automated Backend Allocation for Multi-Model, On-Device AI Inference

4. SLO-Aware Inference Scheduler for Heterogeneous Processors in Edge Platforms