Affiliation:
1. AMD Boulder Colorado USA
2. Department of Computer Science and Engineering Lehigh University Bethlehem Pennsylvania USA
3. Pacific Northwest National Laboratory Richland Washington USA
4. National Renewable Energy Laboratory Golden Colorado USA
Abstract
AbstractIncomplete LU (ILU) smoothers are effective in the algebraic multigrid (AMG) ‐cycle for reducing high‐frequency components of the error. However, the requisite direct triangular solves are comparatively slow on GPUs. Previous work has demonstrated the advantages of Jacobi iteration as an alternative to direct solution of these systems. Depending on the threshold and fill‐level parameters chosen, the factors can be highly nonnormal and Jacobi is unlikely to converge in a low number of iterations. We demonstrate that row scaling can reduce the departure from normality, allowing us to replace the inherently sequential solve with a rapidly converging Richardson iteration. There are several advantages beyond the lower compute time. Scaling is performed locally for a diagonal block of the global matrix because it is applied directly to the factor. Further, an ILUT Schur complement smoother maintains a constant GMRES iteration count as the number of MPI ranks increases, and thus parallel strong‐scaling is improved. Our algorithms have been incorporated into hypre, and we demonstrate improved time to solution for linear systems arising in the Nalu‐Wind and PeleLM pressure solvers. For large problem sizes, GMRESAMG executes at least five times faster when using iterative triangular solves compared with direct solves on massively parallel GPUs.
Subject
Applied Mathematics,Computer Science Applications,Mechanical Engineering,Mechanics of Materials,Computational Mechanics
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献