The optimization of Fast Fourier Transfer (FFT) problems that can fit into GPU memory has been studied extensively. Such on-card FFT libraries like CUFFT can generally achieve much better performance than their counterparts on a CPU, as the data transfer between CPU and GPU is usually not counted in their performance. This high performance, however, is limited by the relatively small GPU memory. When the FFT problem size grows, the data transfer, which typically involves the PCI bus between system and GPU memory, can compose a substantial part of the overall execution time. Therefore, optimizations for FFT problems that outgrow the GPU memory can not bypass the tuning of data transfer between CPU and GPU. However, no prior study has attacked this problem.
This paper is the first effort of trying to use GPUs to compute large FFTs that resides in the CPU memory of a single compute node.
Liang Gu earned his B.S and M.S. degrees in Electrical and Computer Engineering at Shanghai Jiao Tong University in 2003 and 2006. He is a Ph.D. student under the advise of Dr. Xiaoming Li in ECE Department of UD.
His research interest includes Fast Fourier Transform (FFT) performance modeling on general processors and FFT library generation on GPGPU.