Improving the Performance of Heterogeneous Data Centers through Redundancy

Author:

Anton Elene1,Ayesta Urtzi2,Jonckheere Matthieu3,Verloop Ina Maria1

Affiliation:

1. CNRS - IRIT & Université de Toulouse INP, Toulouse, France

2. CNRS - IRIT & Université de Toulouse INP & IKERBASQUE & University of the Basque Country, Toulouse, France

3. Universidad de Buenos Aires, Buenos Aires, Argentina

Abstract

We analyze the performance of redundancy in a multi-type job and multi-type server system. We assume the job dispatcher is unaware of the servers' capacities, and we set out to study under which circumstances redundancy improves the performance. With redundancy an arriving job dispatches redundant copies to all its compatible servers, and departs as soon as one of its copies completes service. As a benchmark comparison, we take the non-redundant system in which a job arrival is routed to only one randomly selected compatible server. Service times are generally distributed and all copies of a job are identical, i.e., have the same service requirement. In our first main result, we characterize the sufficient and necessary stability conditions of the redundancy system. This condition coincides with that of a system where each job type only dispatches copies into its least-loaded servers, and those copies need to be fully served. In our second result, we compare the stability regions of the system under redundancy to that of no redundancy. We show that if the server's capacities are sufficiently heterogeneous, the stability region under redundancy can be much larger than that without redundancy. We apply the general solution to particular classes of systems, including redundancy-d and nested models, to derive simple conditions on the degree of heterogeneity required for redundancy to improve the stability. As such, our result is the first in showing that redundancy can improve the stability and hence performance of a system when copies are non-i.i.d..

Funder

Agence Nationale de la Recherche

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Multi-dimensional State Space Collapse in Non-complete Resource Pooling Scenarios;Proceedings of the ACM on Measurement and Analysis of Computing Systems;2024-05-21

2. Efficient scheduling in redundancy systems with general service times;Queueing Systems;2024-03-22

3. Multi Resource Scheduling with Task Cloning in Heterogeneous Clusters;Proceedings of the 51st International Conference on Parallel Processing;2022-08-29

4. Power- and QoS-Aware Job Assignment With Dynamic Speed Scaling for Cloud Data Center Computing;IEEE Access;2022

5. The cost of collaboration;Queueing Systems;2021-11-02

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3