Affiliation:
1. Cleversafe, an IBM Company
2. Illinois Institute of Technology
Abstract
As parallel and distributed systems are evolving toward extreme scale, for example, high-performance computing systems involve millions of cores and billion-way parallelism, and high-capacity storage systems require efficient access to petabyte or exabyte of data, many new challenges are posed on designing and deploying next-generation interconnection communication networks in these systems. Fat-tree networks have been widely used in both data centers and high-performance computing (HPC) systems in the past decades and are promising candidates of the next-generation extreme-scale networks. In this article, we present FatTreeSim, a simulation framework that supports modeling and simulation of extreme-scale fat-tree networks with the goal of understanding the design constraints of next-generation HPC and distributed systems and aiding the design and performance optimization of the applications running on these systems. We have systematically experimented FatTreeSim on Emulab and Blue Gene/Q and analyzed the scalability and fidelity of FatTreeSim with various network configurations. On the Blue Gene/Q Mira, FatTreeSim can achieve a peak performance of 305 million events per second using 16,384 cores. Finally, we have applied FatTreeSim to simulate several large-scale Hadoop YARN applications to demonstrate its usability.
Funder
Maryland Procurement Office
Air Force Office of Scientific Research
Office of Science of the U.S. Department of Energy
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,Modelling and Simulation
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SimMSG: Simulating Transportation of MPI Messages in High Performance Computing Systems;2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys);2023-12-17
2. BurstBalancer: Do Less, Better Balance for Large-scale Data Center Traffic;IEEE Transactions on Parallel and Distributed Systems;2023
3. Improved Power of Two Choices for Fat-Tree Routing;IEEE Transactions on Network and Service Management;2018-12
4. Large Scale Data Centers Simulation Based on Baseline Test Model;2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW);2018-05
5. Guest Editorial for the TOMACS Special Issue on the Principles of Advanced Discrete Simulation (PADS);ACM Transactions on Modeling and Computer Simulation;2017-07-06