Affiliation:
1. Technical University of Munich, Garching, Germany
2. Uppsala University, Uppsala, Sweden
Abstract
This article presents matrix-free finite-element techniques for efficiently solving partial differential equations on modern many-core processors, such as graphics cards. We develop a GPU parallelization of a matrix-free geometric multigrid iterative solver targeting moderate and high polynomial degrees, with support for general curved and adaptively refined hexahedral meshes with hanging nodes. The central algorithmic component is the matrix-free operator evaluation with sum factorization. We compare the node-level performance of our implementation running on an Nvidia Pascal P100 GPU to a highly optimized multicore implementation running on comparable Intel Broadwell CPUs and an Intel Xeon Phi. Our experiments show that the GPU implementation is approximately 1.5 to 2 times faster across four different scenarios of the Poisson equation and a variety of element degrees in 2D and 3D. The lowest time to solution per degree of freedom is recorded for moderate polynomial degrees between 3 and 5. A detailed performance analysis highlights the capabilities of the GPU architecture and the chosen execution model with threading within the element, particularly with respect to the evaluation of the matrix-vector product. Atomic intrinsics are shown to provide a fast way for avoiding the possible race conditions in summing the elemental residuals into the global vector associated to shared vertices, edges, and surfaces. In addition, the solver infrastructure allows for using mixed-precision arithmetic that performs the multigrid V-cycle in single precision with an outer correction in double precision, increasing throughput by up to 83%.
Funder
Deutsche Forschungsgemeinschaft
Publisher
Association for Computing Machinery (ACM)
Subject
Computational Theory and Mathematics,Computer Science Applications,Hardware and Architecture,Modeling and Simulation,Software
Reference74 articles.
1. Parallel multigrid smoothing: polynomial versus Gauss–Seidel
2. Mark Adams Phillip Colella Daniel T. Graves Jeff N. Johnson Hans S. Johansen Noel D. Keen Terry J. Ligocki etal 2015. Chombo Software Package for AMR Applications Design Document. Technical Report. Lawrence Berkeley National Laboratory. https://crd.lbl.gov/assets/pubs_presos/chomboDesign.pdf. Mark Adams Phillip Colella Daniel T. Graves Jeff N. Johnson Hans S. Johansen Noel D. Keen Terry J. Ligocki et al. 2015. Chombo Software Package for AMR Applications Design Document. Technical Report. Lawrence Berkeley National Laboratory. https://crd.lbl.gov/assets/pubs_presos/chomboDesign.pdf.
3. Toward textbook multigrid efficiency for fully implicit resistive magnetohydrodynamics
4. The deal.II library, Version 9.0
Cited by
48 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献