Multidisciplinary simulation acceleration using multiple shared memory graphical processing units-Reference-Cited by-同舟云学术

Multidisciplinary simulation acceleration using multiple shared memory graphical processing units

Published:2016-07-27 Issue:4 Volume:30 Page:486-508
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Kemal Jonathan Y¹,Davis Roger L¹,Owens John D²

Affiliation:

1. Department of Mechanical and Aerospace Engineering, University of California, Davis, CA, USA

2. Department of Electrical and Computer Engineering, University of California, Davis, CA, USA

Abstract

In this article, we describe the strategies and programming techniques used in porting a multidisciplinary fluid/thermal interaction procedure to graphical processing units (GPUs). We discuss the strategies for selecting which disciplines or routines are chosen for use on GPUs rather than CPUs. In addition, we describe the programming techniques including use of Compute Unified Device Architecture (CUDA), mixed-language (Fortran/C/CUDA) usage, Fortran/C memory mapping of arrays, and GPU optimization. We solve all equations using the multi-block, structured grid, finite volume numerical technique, with the dual time-step scheme used for unsteady simulations. Our numerical solver code targets CUDA-capable GPUs produced by NVIDIA. We use NVIDIA Tesla C2050/C2070 GPUs based on the Fermi architecture and compare our resulting performance against Intel Xeon X5690 CPUs. Individual solver routines converted to CUDA typically run about 10 times faster on a GPU for sufficiently dense computational grids. We used a conjugate cylinder computational grid and ran a turbulent steady flow simulation using four increasingly dense computational grids. Our densest computational grid is divided into 13 blocks each containing 1033×1033 grid points, for a total of 13.87 million grid points or 1.07 million grid points per domain block. Comparing the performance of eight GPUs to that of eight CPUs, we obtain an overall speedup of about 6.0 when using our densest computational grid. This amounts to an 8-GPU simulation running about 39.5 times faster than running than a single-CPU simulation.

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/1094342016639114

Reference21 articles.

1. Sparse matrix solvers on the GPU

2. Acceleration of a 3D Euler Solver Using Commodity Graphics Hardware

3. Detached-Eddy Simulation Procedure Targeted for Design

4. Cascade viscous flow analysis using the Navier-Stokes equations