Affiliation:
1. Computer Science and Engineering KOREATECH Cheonan Republic of Korea
Abstract
AbstractWe propose a novel graphics processing unit (GPU) algorithm that can handle a large‐scale 3D fast Fourier transform (i.e., 3D‐FFT) problem whose data size is larger than the GPU's memory. A 1D FFT‐based 3D‐FFT computational approach is used to solve the limited device memory issue. Moreover, to reduce the communication overhead between the CPU and GPU, we propose a 3D data‐transposition method that converts the target 1D vector into a contiguous memory layout and improves data transfer efficiency. The transposed data are communicated between the host and device memories efficiently through the pinned buffer and multiple streams. We apply our method to various large‐scale benchmarks and compare its performance with the state‐of‐the‐art multicore CPU FFT library (i.e., fastest Fourier transform in the West [FFTW]) and a prior GPU‐based 3D‐FFT algorithm. Our method achieves a higher performance (up to 2.89 times) than FFTW; it yields more performance gaps as the data size increases. The performance of the prior GPU algorithm decreases considerably in massive‐scale problems, whereas our method's performance is stable.
Funder
National Research Foundation of Korea
Subject
Electrical and Electronic Engineering,General Computer Science,Electronic, Optical and Magnetic Materials
Reference34 articles.
1. An algorithm for the machine calculation of complex Fourier series
2. The Design and Implementation of FFTW3
3. M.FrigoandS. G.Johnson FFTW: an adaptive software architecture for the FFT (Proc. 1998 IEEE Int. Conf. Acoust. Speech Signal Process. Seattle WA USA) 1998 pp.1381–1384.
4. Intel Intel® Math Kernel Library 2020.https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献