Author:
,ANTONESCU Mihai,ȘTEFAN Gheorghe,
Abstract
Some parallel computing patterns can be accelerated using appropriate networks of simple circuits. We propose a solution, based on the Beneš-Waksman permutation network, which is adapted to efficiently accelerate not only permutation, but some of the most used parallel computational patterns such as: pack, prefix operations and reductions. The structural context considered for deploying our circuit is the map parallel pattern represented by an array of computational elements. The developed network receives a vector from the map array and outputs a vector for functions such as permute, pack, prefix sum (thus closing a first global loop over the map array of cells). For reduction functions (add, min, max) the network returns a scalar (thus closing a second global loop over the map array). With these improvements, this network adds circuit support for frequently used functions, in addition to map-type functions performed in the array of computing elements. While for reduction functions the frame of the permutation network can be easily adapted, for prefix functions and for the pack function new forms of implementation are proposed. The cells of the Bene\v s-Waksman network are redesigned to support the additional functionality. Some applications are then presented to emphasize the utility of our design.
Reference13 articles.
1. "[1] M. McCOOL, A. D. ROBINSON and J. REINDERS, Structured Parallel Programming. Patterns for Efficient Computation, Morgan Kaufman, 2012.
2. [2] V. E. BENEˇ S, Optimal Rearrangeable Multistage Connecting Networks, Bell System Technical Journal, Vol 43, No 4, pp 1646-1656, 1964.
3. [3] V. E. BENEˇ S, MathemaTIcal Theory of connecting networks and Telephone Traffic, Academic Press New York, 1965.
4. [4] A. WAKSMAN, A Permutation Network, Journal of the ACM, v. 15. no. 1, pp. 159-163, 1968.
5. [5] Message Passing Interface Forum. MPI: A Message-Passing Interface Standard Version 4.0" June 2021. Accessed June 15, 2023 [online]. Available: https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf,