1. Improving performance of matrix multiplication and FFT on GPU;Cui,2009
2. Accelerating linpack with CUDA on heterogenous clusters;Fatica,2009
3. Benchmarking GPUs to tune dense linear algebra;Volkov,2008
4. 3D finite difference computation on GPUs using CUDA;Micikevicius,2009
5. Inside the FFT black box — serial and parallel fast Fourier transform algorithms;Chu,2000