Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU-Reference-Cited by-同舟云学术

Performance Optimization of 3D Lattice Boltzmann Flow Solver on a GPU

Published:2017 Issue: Volume:2017 Page:1-16
ISSN:1058-9244
Container-title:Scientific Programming
language:en
Short-container-title:Scientific Programming

Author:

Tran Nhat-Phuong¹^ORCID,Lee Myungho¹^ORCID,Hong Sugwon¹

Affiliation:

1. Department of Computer Science and Engineering, Myongji University, 116 Myongji-ro, Cheoin-gu, Yongin, Gyeonggi-do, Republic of Korea

Abstract

Lattice Boltzmann Method (LBM) is a powerful numerical simulation method of the fluid flow. With its data parallel nature, it is a promising candidate for a parallel implementation on a GPU. The LBM, however, is heavily data intensive and memory bound. In particular, moving the data to the adjacent cells in the streaming computation phase incurs a lot of uncoalesced accesses on the GPU which affects the overall performance. Furthermore, the main computation kernels of the LBM use a large number of registers per thread which limits the thread parallelism available at the run time due to the fixed number of registers on the GPU. In this paper, we develop high performance parallelization of the LBM on a GPU by minimizing the overheads associated with the uncoalesced memory accesses while improving the cache locality using the tiling optimization with the data layout change. Furthermore, we aggressively reduce the register uses for the LBM kernels in order to increase the run-time thread parallelism. Experimental results on the Nvidia Tesla K20 GPU show that our approach delivers impressive throughput performance: 1210.63 Million Lattice Updates Per Second (MLUPS).

Funder

Next-Generation Information Computing Development Program through the National Research Foundation of Korea

Publisher

Hindawi Limited

Subject

Computer Science Applications,Software

Link

http://downloads.hindawi.com/journals/sp/2017/1205892.pdf

Reference13 articles.

1. TeraFLOP computing on a desktop PC with GPUs for 3D CFD

2. Fluid flow simulation on the Cell Broadband Engine using the lattice Boltzmann method

3. Performance analysis and optimization strategies for a D3Q19 lattice Boltzmann kernel on nVIDIA GPUs using CUDA