Mars: Near-Optimal Throughput with Shallow Buffers in Reconfigurable Datacenter Networks

Author:

Addanki Vamsi1ORCID,Avin Chen2ORCID,Schmid Stefan1ORCID

Affiliation:

1. TU Berlin, Berlin, Germany

2. Ben-Gurion University of the Negev, Be'er Sheva, Israel

Abstract

The performance of large-scale computing systems often critically depends on high-performance communication networks. Dynamically reconfigurable topologies, e.g., based on optical circuit switches, are emerging as an innovative new technology to deal with the explosive growth of datacenter traffic. Specifically, periodic reconfigurable datacenter networks (RDCNs) such as RotorNet (SIGCOMM 2017), Opera (NSDI 2020) and Sirius (SIGCOMM 2020) have been shown to provide high throughput, by emulating a complete graph through fast periodic circuit switch scheduling. However, to achieve such a high throughput, existing reconfigurable network designs pay a high price: in terms of potentially high delays, but also, as we show as a first contribution in this paper, in terms of the high buffer requirements. In particular, we show that under buffer constraints, emulating the high-throughput complete graph is infeasible at scale, and we uncover a spectrum of unvisited and attractive alternative RDCNs, which emulate regular graphs, but with lower node degree than the complete graph. We present Mars, a periodic reconfigurable topology which emulates ad-regular graph with near-optimal throughput. In particular, we systematically analyze how the degree d can be optimized for throughput given the available buffer and delay tolerance of the datacenter. We further show empirically that Mars achieves higher throughput compared to existing systems when buffer sizes are bounded.

Funder

European Research Council

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Reference74 articles.

1. Arjun Singh , Joon Ong , Amit Agarwal , Glen Anderson , Ashby Armistead , Roy Bannon , Seb Boving , Gaurav Desai , Bob Felderman , Paulie Germano , Anand Kanagala , Jeff Provost , Jason Simmons , Eiichi Tanda , Jim Wanderer , Urs Hölzle , Stephen Stuart , and Amin Vahdat . Jupiter rising : A decade of clos topologies and centralized control in google's datacenter network . In Proceedings of the ACM SIGCOMM 2015 Conference , page 183 -- 197 , 2015 . Arjun Singh, Joon Ong, Amit Agarwal, Glen Anderson, Ashby Armistead, Roy Bannon, Seb Boving, Gaurav Desai, Bob Felderman, Paulie Germano, Anand Kanagala, Jeff Provost, Jason Simmons, Eiichi Tanda, Jim Wanderer, Urs Hölzle, Stephen Stuart, and Amin Vahdat. Jupiter rising: A decade of clos topologies and centralized control in google's datacenter network. In Proceedings of the ACM SIGCOMM 2015 Conference, page 183--197, 2015.

2. Hitesh Ballani , Paolo Costa , Raphael Behrendt , Daniel Cletheroe , Istvan Haller , Krzysztof Jozwik , Fotini Karinou , Sophie Lange , Kai Shi , Benn Thomsen , and Hugh Williams . Sirius : A flat datacenter network with nanosecond optical switching . In Proceedings of the ACM SIGCOMM 2020 Conference , page 782 -- 797 , 2020 . Hitesh Ballani, Paolo Costa, Raphael Behrendt, Daniel Cletheroe, Istvan Haller, Krzysztof Jozwik, Fotini Karinou, Sophie Lange, Kai Shi, Benn Thomsen, and Hugh Williams. Sirius: A flat datacenter network with nanosecond optical switching. In Proceedings of the ACM SIGCOMM 2020 Conference, page 782--797, 2020.

3. William M. Mellette , Rob McGuinness , Arjun Roy , Alex Forencich , George Papen , Alex C. Snoeren , and George Porter . Rotornet : A scalable, low-complexity, optical datacenter network . In Proceedings of the ACM SIGCOMM 2017 Conference , page 267 -- 280 , 2017 . William M. Mellette, Rob McGuinness, Arjun Roy, Alex Forencich, George Papen, Alex C. Snoeren, and George Porter. Rotornet: A scalable, low-complexity, optical datacenter network. In Proceedings of the ACM SIGCOMM 2017 Conference, page 267--280, 2017.

4. William M. Mellette , Rajdeep Das , Yibo Guo , Rob McGuinness , Alex C. Snoeren , and George Porter . Expanding across time to deliver bandwidth efficiency and low latency . In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20) , pages 1 -- 18 , Santa Clara, CA , February 2020 . USENIX Association. William M. Mellette, Rajdeep Das, Yibo Guo, Rob McGuinness, Alex C. Snoeren, and George Porter. Expanding across time to deliver bandwidth efficiency and low latency. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), pages 1--18, Santa Clara, CA, February 2020. USENIX Association.

5. Helios

Cited by 9 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Shale: A Practical, Scalable Oblivious Reconfigurable Network;Proceedings of the ACM SIGCOMM 2024 Conference;2024-08-04

2. Uniform-Cost Multi-Path Routing for Reconfigurable Data Center Networks;Proceedings of the ACM SIGCOMM 2024 Conference;2024-08-04

3. Breaking the VLB Barrier for Oblivious Reconfigurable Networks;Proceedings of the 56th Annual ACM Symposium on Theory of Computing;2024-06-10

4. Beyond matchings: Dynamic multi-hop topology for demand-aware datacenters;Computer Networks;2024-02

5. Optimizing Reconfigurable Optical Datacenters: The Power of Randomization;Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis;2023-11-11

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3