Affiliation:
1. University of Extremadura, Spain
2. University College Dublin, Ireland
Abstract
This survey aims to present the state of the art in analytic communication performance models, providing sufficiently detailed descriptions of particularly noteworthy efforts. Modeling the cost of communications in computer clusters is an important and challenging problem. It provides insights into the design of the communication pattern of parallel scientific applications and mathematical kernels and sets a clear ground for optimization of their deployment in the increasingly complex high-performance computing infrastructure. The survey provides background information on how different performance models represent the underlying platform and shows the evolution of these models over time from early clusters of single-core processors to present-day multi-core and heterogeneous platforms. Prospective directions for future research in the area of analytic communication performance modeling conclude the survey.
Funder
Science Foundation Ireland
European Regional Development Fund ”A way to achieve Europe„
Extremadura Local Government
EU under the COST Program Action IC1305: Network for Sustainable Ultrascale Computing
Publisher
Association for Computing Machinery (ACM)
Subject
General Computer Science,Theoretical Computer Science
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Efficient Inter-Datacenter AllReduce With Multiple Trees;IEEE Transactions on Network Science and Engineering;2024-09
2. Graph Computation with Adaptive Granularity;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13
3. 3D Parallelism for Transformers via Integer Programming;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14
4. Network states-aware collective communication optimization;Cluster Computing;2024-03-10
5. SUARA: A scalable universal allreduce communication algorithm for acceleration of parallel deep learning applications;Journal of Parallel and Distributed Computing;2024-01