HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

Author:

Mo Hao,Zhu Ligu,Shi LeiORCID,Tan Songfu,Wang Suping

Abstract

To accelerate the inference of machine-learning (ML) model serving, clusters of machines require the use of expensive hardware accelerators (e.g., GPUs) to reduce execution time. Advanced inference serving systems are needed to satisfy latency service-level objectives (SLOs) in a cost-effective manner. Novel autoscaling mechanisms that greedily minimize the number of service instances while ensuring SLO compliance are helpful. However, we find that it is not adequate to guarantee cost effectiveness across heterogeneous GPU hardware, and this does not maximize resource utilization. In this paper, we propose HetSev to address these challenges by incorporating heterogeneity-aware autoscaling and resource-efficient scheduling to achieve cost effectiveness. We develop an autoscaling mechanism which accounts for SLO compliance and GPU heterogeneity, thus provisioning the appropriate type and number of instances to guarantee cost effectiveness. We leverage multi-tenant inference to improve GPU resource utilization, while alleviating inter-tenant interference by avoiding the co-location of identical ML instances on the same GPU during placement decisions. HetSev is integrated into Kubernetes and deployed onto a heterogeneous GPU cluster. We evaluated the performance of HetSev using several representative ML models. Compared with default Kubernetes, HetSev reduces resource cost by up to 2.15× while meeting SLO requirements.

Funder

National Key Research and Development Program

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference60 articles.

1. (2022, October 01). Amazon Machine Learning. Available online: https://aws.amazon.com/machine-learning/.

2. (2022, October 03). Google Cloud Prediction API Documentation. Available online: https://cloud.google.com/ai-platform/prediction/docs.

3. (2022, September 22). Microsoft Azure Machine Learning. Available online: https://azure.microsoft.com/en-us/svices/machine-learning/.

4. Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., and Stoica, I. (2017, January 27–29). Clipper: A {Low-Latency} Online Prediction Serving System. Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17), Boston, MA, USA.

5. Gujarati, A., Elnikety, S., He, Y., McKinley, K.S., and Brandenburg, B.B. (2017, January 11–15). Swayam: Distributed autoscaling to meet slas of machine learning inference services with resource efficiency. Proceedings of the 18th ACM/IFIP/USENIX Middleware Conference, Las Vegas, NV, USA.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3