Affiliation:
1. University of Illinois at Urbana-Champaign, Urbana, IL, USA
Abstract
Known algorithms for two important collective communication operations, allgather and reduce-scatter, are minimal-communication algorithms; no process sends or receives more than the minimum amount of data. This, combined with the data-ordering semantics of the operations, limits the flexibility and performance of these algorithms. Our novel non-minimal, topology-aware algorithms deliver far better performance with the addition of a very small amount of redundant communication. We develop novel algorithms for Clos networks and single or multi-ported torus networks. Tests on a 32k-node BlueGene/P result in allgather speedups of up to 6x and reduce-scatter speedups of over 11x compared to the native IBM algorithm. Broadcast, reduce, and allreduce can be composed of allgather or reduce-scatter and other collective operations; our techniques also improve the performance of these algorithms.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference19 articles.
1. HyperX
2. Interprocessor collective communication library (InterCom)
3. K. Bergman S. Borkar D. Campbell W. Carlson W. Dally M. Denneau P. Franzon W. Harrod J. Hiller S. Karp S. Keckler D. Klein R. Lucas M. Richards A. Scarpelli S. Scott A. Snavely T. Sterling R. S. Williams and K. Yelick. Exascale computing study: Technology challenges in achieving exascale systems 2008. K. Bergman S. Borkar D. Campbell W. Carlson W. Dally M. Denneau P. Franzon W. Harrod J. Hiller S. Karp S. Keckler D. Klein R. Lucas M. Richards A. Scarpelli S. Scott A. Snavely T. Sterling R. S. Williams and K. Yelick. Exascale computing study: Technology challenges in achieving exascale systems 2008.
4. Efficient algorithms for all-to-all communications in multiport message-passing systems
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Efficient Algorithm for All-Gather Operation in Optical Interconnect Systems;IEEE Open Journal of the Communications Society;2024
2. TH-Allreduce: Optimizing Small Data Allreduce Operation on Tianhe System;2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS);2023-12-17
3. Generalized Collective Algorithms for the Exascale Era;2023 IEEE International Conference on Cluster Computing (CLUSTER);2023-10-31
4. Wrht: Efficient All-reduce for Distributed DNN Training in Optical Interconnect Systems;Proceedings of the 52nd International Conference on Parallel Processing;2023-08-07
5. FMI: Fast and Cheap Message Passing for Serverless Functions;Proceedings of the 37th International Conference on Supercomputing;2023-06-21