Network-aware compute and memory allocation in optically composable data centers with deep reinforcement learning and graph neural networks

Author:

Shabka ZacharayaORCID,Zervas Georgios

Abstract

Composable data center architectures promise a means of pooling resources remotely within data centers, allowing for both more flexibility and resource efficiency underlying the increasingly important infrastructure-as-a-service business. This can be accomplished by means of using an optically circuit switched backbone in the data center network (DCN), providing the required bandwidth and latency guarantees to ensure reliable performance when applications are run across non-local resource pools. However, resource allocation in this scenario requires both server-level and network-level resources to be co-allocated to requests. The online nature and underlying combinatorial complexity of this problem, alongside the typical scale of DCN topologies, make exact solutions impossible and heuristic-based solutions sub-optimal or non-intuitive to design. We demonstrate that deep reinforcement learning, where the policy is modeled by a graph neural network, can be used to learn effective network-aware and topologically scalable allocation policies end-to-end. Compared to state-of-the-art heuristics for network-aware resource allocation, the method achieves up to a 20% higher acceptance ratio, can achieve the same acceptance ratio as the best performing heuristic with 3 × less networking resources available, and can maintain all-around performance when directly applied (with no further training) to DCN topologies with 10 2 × more servers than the topologies seen during training.

Funder

Engineering and Physical Sciences Research Council

Innovate UK

Publisher

Optica Publishing Group

Subject

Computer Networks and Communications

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. RISA: Round-Robin Intra-Rack Friendly Scheduling Algorithm for Disaggregated Datacenters;Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis;2023-11-12

2. Recent Scientific Achievements and Developments in Software Defined Networking: A Survey;2023 1st International Conference on Circuits, Power and Intelligent Systems (CCPIS);2023-09-01

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3