1. Minimizing communication overhead using pipelining for multi-dimensional FFT on distributed memory machines;Calvin,1993
2. An algorithm for the machine calculation of complex Fourier series;Cooley;Math. Comput.,1965
3. Efficient matrix transposition;Eklundh,1981
4. The scalability of FFT on parallel computers;Gupta;IEEE Trans. Parallel Dist. Systems,1992
5. Block algorithms for FFTs on vector and parallel computers;Hegland,1994