Affiliation:
1. School of Computer Science and Technology, Qingdao University, Qingdao 266071, China
Abstract
Currently, research on the lattice Boltzmann method mainly focuses on its numerical simulation and applications, and there is an increasing demand for large-scale simulations in practical scenarios. In response to this situation, this study successfully implemented a large-scale heterogeneous parallel algorithm for the lattice Boltzmann method using OpenMP, MPI, Pthread, and OpenCL parallel technologies on the “Dongfang” supercomputer system. The accuracy and effectiveness of this algorithm were verified through the lid-driven cavity flow simulation. The paper focused on optimizing the algorithm in four aspects: Firstly, non-blocking communication was employed to overlap communication and computation, thereby improving parallel efficiency. Secondly, high-speed shared memory was utilized to enhance memory access performance and reduce latency. Thirdly, a balanced computation between the central processing unit and the accelerator was achieved through proper task partitioning and load-balancing strategies. Lastly, memory access efficiency was improved by adjusting the memory layout. Performance testing demonstrated that the optimized algorithm exhibited improved parallel efficiency and scalability, with computational performance that is 4 times greater than before optimization and 20 times that of a 32-core CPU.
Funder
the Shandong Province Natural Science Foundation
GHFund A
the National Key R&D Program of China