DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice‐based system-Reference-Cited by-同舟云学术

DRS: A deep reinforcement learning enhanced Kubernetes scheduler for microservice‐based system

Published:2023-10-25 Issue: Volume: Page:
ISSN:0038-0644
Container-title:Software: Practice and Experience
language:en
Short-container-title:Softw Pract Exp

Author:

Jian Zhaolong¹^ORCID,Xie Xueshuo¹²,Fang Yaozheng¹,Jiang Yibing¹,Lu Ye³,Dash Ankan⁴,Li Tao¹²,Wang Guiling⁴

Affiliation:

1. College of Computer Science Nankai University Tianjin China

2. State Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences Beijing China

3. College of Cyber Science Nankai University Tianjin China

4. Department of Computer Science New Jersey Institute of Technology Newark New Jersey USA

Abstract

SummaryRecently, Kubernetes is widely used to manage and schedule the resources of microservices in cloud‐native distributed applications, as the most famous container orchestration framework. However, Kubernetes preferentially schedules microservices to nodes with rich and balanced CPU and memory resources on a single node. The native scheduler of Kubernetes, called Kube‐scheduler, may cause resource fragmentation and decrease resource utilization. In this paper, we propose a deep reinforcement learning enhanced Kubernetes scheduler named DRS. We initially frame the Kubernetes scheduling problem as a Markov decision process with intricately designed state, action, and reward structures in an effort to increase resource usage and decrease load imbalance. Then, we design and implement DRS mointor to perceive six parameters concerning resource utilization and create a thorough picture of all available resources globally. Finally, DRS can automatically learn the scheduling policy through interaction with the Kubernetes cluster, without relying on expert knowledge about workload and cluster status. We implement a prototype of DRS in a Kubernetes cluster with five nodes and evaluate its performance. Experimental results highlight that DRS overcomes the shortcomings of Kube‐scheduler and achieves the expected scheduling target with three workloads. With only 3.27% CPU overhead and 0.648% communication delay, DRS outperforms Kube‐scheduler by 27.29% in terms of resource utilization and reduces load imbalance by 2.90 times on average.

Funder

National Key Research and Development Program of China

Publisher

Wiley

Subject

Software

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.3284

Reference45 articles.

1. GuR ZhangK XuZ et al.Fluid: dataset abstraction and elastic acceleration for cloud‐native deep learning training jobs. Paper presented at: 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE.2022:2182‐2195.

2. AI-Based Resource Management in Beyond 5G Cloud Native Environment

3. Distributed redundancy placement for microservice‐based applications at the edge;Zhao H;IEEE Trans Serv Comput,2020

4. A survey on microservice security‐trends in architecture, privacy and standardization on cloud computing environments;Monteiro A;Int J Adv Secur,2018

5. Adaptive Resource Efficient Microservice Deployment in Cloud-Edge Continuum

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Telemetry-Driven Microservices Orchestration in Cloud-Edge Environments;2024 IEEE 17th International Conference on Cloud Computing (CLOUD);2024-07-07

2. Graph Attention Networks and Deep Q-Learning for Service Mesh Optimization: A Digital Twinning Approach;ICC 2024 - IEEE International Conference on Communications;2024-06-09

3. CAROKRS: Cost-Aware Resource Optimization Kubernetes Resource Scheduler;2024 9th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA);2024-04-25

4. Development of a Query Delay Injection System for the MEC Simulator of the LWMECPS Platform;2024 International Russian Smart Industry Conference (SmartIndustryCon);2024-03-25

5. ODRL: Reinforcement Learning in Priority Scheduling for Running Cost Optimization;2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS);2023-12-17