Affiliation:
1. IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, CA 95120, U.S.A.
Abstract
We present a class of efficient algorithms for global combine operations in k-port message-passing systems. In the k-port communication model, in each communication round, a processor can send data to k other processors and simultaneously receive data from k other processors. We consider algorithms for global combine operations in n processors with respect to a commutative and associative reduction function. Initially, each processor holds a vector of m data items and finally the result of the reduction function over the n vectors of data items, which is also a vector of m data items, is known to all n processors. We present three efficient algorithms that employ various trade-offs between the number of communication rounds and the number of data items transferred in sequence. For the case m=1, we have an algorithm which is optimal in both the number of communication rounds and the number of data items transferred in sequence.
Publisher
World Scientific Pub Co Pte Lt
Subject
Hardware and Architecture,Theoretical Computer Science,Software
Cited by
15 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Uniform Algorithms for Reduce-scatter and (most) other Collectives for MPI;2023 IEEE International Conference on Cluster Computing (CLUSTER);2023-10-31
2. An optimisation of allreduce communication in message-passing systems;Parallel Computing;2021-10
3. DGCL;Proceedings of the Sixteenth European Conference on Computer Systems;2021-04-21
4. Non-clairvoyant reduction algorithms for heterogeneous platforms;Concurrency and Computation: Practice and Experience;2014-07-30
5. Bandwidth optimal all-reduce algorithms for clusters of workstations;Journal of Parallel and Distributed Computing;2009-02