An Implementation of LASER Beam Welding Simulation on Graphics Processing Unit Using CUDA-Reference-Cited by-同舟云学术

An Implementation of LASER Beam Welding Simulation on Graphics Processing Unit Using CUDA

Published:2024-04-17 Issue:4 Volume:12 Page:83
ISSN:2079-3197
Container-title:Computation
language:en
Short-container-title:Computation

Author:

Nascimento Ernandes¹^ORCID,Magalhães Elisan¹^ORCID,Azevedo Arthur¹^ORCID,Paes Luiz E. S.²^ORCID,Oliveira Ariel¹^ORCID

Affiliation:

1. Aeronautics Institute of Technology—ITA, São José dos Campos 12228-900, SP, Brazil

2. Faculty of Mechanical Engineering, Federal University of Uberlândia—UFU, Uberlândia 38410-337, MG, Brazil

Abstract

The maximum number of parallel threads in traditional CFD solutions is limited by the Central Processing Unit (CPU) capacity, which is lower than the capabilities of a modern Graphics Processing Unit (GPU). In this context, the GPU allows for simultaneous processing of several parallel threads with double-precision floating-point formatting. The present study was focused on evaluating the advantages and drawbacks of implementing LASER Beam Welding (LBW) simulations using the CUDA platform. The performance of the developed code was compared to that of three top-rated commercial codes executed on the CPU. The unsteady three-dimensional heat conduction Partial Differential Equation (PDE) was discretized in space and time using the Finite Volume Method (FVM). The Volumetric Thermal Capacitor (VTC) approach was employed to model the melting-solidification. The GPU solutions were computed using a CUDA-C language in-house code, running on a Gigabyte Nvidia GeForce RTX™ 3090 video card and an MSI 4090 video card (both made in Hsinchu, Taiwan), each with 24 GB of memory. The commercial solutions were executed on an Intel® Core™ i9-12900KF CPU (made in Hillsboro, Oregon, United States of America) with a 3.6 GHz base clock and 16 cores. The results demonstrated that GPU and CPU processing achieve similar precision, but the GPU solution exhibited significantly faster speeds and greater power efficiency, resulting in speed-ups ranging from 75.6 to 1351.2 times compared to the CPU solutions. The in-house code also demonstrated optimized memory usage, with an average of 3.86 times less RAM utilization. Therefore, adopting parallelized algorithms run on GPU can lead to reduced CFD computational costs compared to traditional codes while maintaining high accuracy.

Funder

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior

Conselho Nacional de Desenvolvimento Científico e Tecnológico

Petróleo Brasileiro S.A.

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-3197/12/4/83/pdf

Reference30 articles.

1. From Finite Differences to Finite Elements;J. Comput. Appl. Math.,2001

2. Moukalled, F., Mangani, L., and Darwish, M. (2016). The Finite Volume Method in Computational Fluid Dynamics, Springer International Publishing. Fluid Mechanics and Its Applications.

3. 16-Bit (4 × 4) Optical Random Access Memory (RAM) Bank;Pappas;J. Light. Technol.,2023

4. Comparing Unified, Pinned, and Host/Device Memory Allocations for Memory-Intensive Workloads on Tegra SoC;Choi;Concurr. Comput.,2021

5. GPU Computing of Yield Stress Fluid Flows in Narrow Gaps;Frigaard;Theor. Comput. Fluid Dyn.,2023

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Numerical Estimation of Nonlinear Thermal Conductivity of SAE 1020 Steel;Computation;2024-05-04