Lavender: An Efficient Resource Partitioning Framework for Large-Scale Job Colocation

Author:

Peng Wangqi1ORCID,Li Yusen2ORCID,Liu Xiaoguang3ORCID,Wang Gang3ORCID

Affiliation:

1. SysNet of Nankai University, Nankai University, Tianjin, China

2. School of Computer Science, Nankai University, Tianjin, China

3. School of Computer Science, Nankai University, Tianjin China

Abstract

Workload consolidation is a widely used approach to enhance resource utilization in modern data centers. However, the concurrent execution of multiple jobs on a shared server introduces contention for essential shared resources such as CPU cores, Last Level Cache, and memory bandwidth. This contention negatively impacts job performance, leading to significant degradation in throughput. To mitigate resource contention, effective resource isolation techniques at the software or hardware level can be employed to partition the shared resources among colocated jobs. However, existing solutions for resource partitioning often assume a limited number of jobs that can be colocated, making them unsuitable for scenarios with a large-scale job colocation due to several critical challenges. In this study, we propose Lavender, a framework specifically designed for addressing large-scale resource partitioning problems. Lavender incorporates several key techniques to tackle the challenges associated with large-scale resource partitioning, ensuring efficiency, adaptivity, and optimality. We conducted comprehensive evaluations of Lavender to validate its performance and analyze the reasons for its advantages. The experimental results demonstrate that Lavender significantly outperforms state-of-the-art baselines. Lavender is publicly available at https://github.com/yanxiaoqi932/OpenSourceLavender.

Funder

National Science Foundation of China

Fundamental Research Funds for the Central University; NSF of Tianjin

Publisher

Association for Computing Machinery (ACM)

Reference52 articles.

1. 2020. perf: Linux Profiling with Performance Counters. Retrieved from https://perf.wiki.kernel.org/index.php/

2. Sergey Blagodurov, Alexandra Fedorova, Evgeny Vinnik, Tyler Dwyer, and Fabien Hermenier. 2015. Multi-objective job placement in clusters. In SC’15. 1–12.

3. Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. 2014. Apollo: Scalable and coordinated scheduling for \(\lbrace\) cloud-scale \(\rbrace\) computing. In OSDI’14. 285–300.

4. Orchid: An Online Learning Based Resource Partitioning Framework for Job Colocation With Multiple Objectives

5. Ruobing Chen, Haosen Shi, Yusen Li, Xiaoguang Liu, and Gang Wang. 2023. OLPart: Online learning based resource partitioning for colocating multiple latency-critical jobs on commodity computers. In EuroSys’23.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3