Performance portable ice-sheet modeling with MALI-Reference-Cited by-同舟云学术

Performance portable ice-sheet modeling with MALI

Published:2023-06-27 Issue:5 Volume:37 Page:600-625
ISSN:1094-3420
Container-title:The International Journal of High Performance Computing Applications
language:en
Short-container-title:The International Journal of High Performance Computing Applications

Author:

Watkins Jerry¹^ORCID,Carlson Max¹,Shan Kyle²,Tezaur Irina¹,Perego Mauro³^ORCID,Bertagna Luca³,Kao Carolyn⁴,Hoffman Matthew J⁵,Price Stephen F⁵

Affiliation:

1. Sandia National Laboratories, Livermore, CA, USA

2. Micron Technology, Boise, ID, USA

3. Sandia National Laboratories, Albuquerque, NM, USA

4. TSMC, Hsinchu, Taiwan

5. Los Alamos National Laboratory, Los Alamos, NM, USA

Abstract

High-resolution simulations of polar ice sheets play a crucial role in the ongoing effort to develop more accurate and reliable Earth system models for probabilistic sea-level projections. These simulations often require a massive amount of memory and computation from large supercomputing clusters to provide sufficient accuracy and resolution; therefore, it has become essential to ensure performance on these platforms. Many of today’s supercomputers contain a diverse set of computing architectures and require specific programming interfaces in order to obtain optimal efficiency. In an effort to avoid architecture-specific programming and maintain productivity across platforms, the ice-sheet modeling code known as MPAS-Albany Land Ice (MALI) uses high-level abstractions to integrate Trilinos libraries and the Kokkos programming model for performance portable code across a variety of different architectures. In this article, we analyze the performance portable features of MALI via a performance analysis on current CPU-based and GPU-based supercomputers. The analysis highlights not only the performance portable improvements made in finite element assembly and multigrid preconditioning within MALI with speedups between 1.26 and 1.82x across CPU and GPU architectures but also identifies the need to further improve performance in software coupling and preconditioning on GPUs. We perform a weak scalability study and show that simulations on GPU-based machines perform 1.24–1.92x faster when utilizing the GPUs. The best performance is found in finite element assembly, which achieved a speedup of up to 8.65x and a weak scaling efficiency of 82.6% with GPUs. We additionally describe an automated performance testing framework developed for this code base using a changepoint detection method. The framework is used to make actionable decisions about performance within MALI. We provide several concrete examples of scenarios in which the framework has identified performance regressions, improvements, and algorithm differences over the course of 2 years of development.

Funder

Office of Science

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Link

http://journals.sagepub.com/doi/pdf/10.1177/10943420231183688

Reference70 articles.

1. A survey of methods for time series change point detection

2. Tpetra, and the Use of Generic Programming in Scientific Computing

3. Amesos2 and Belos: Direct and Iterative Solvers for Large Sparse Linear Systems

4. MueLu User?s Guide.

5. Velocity and stress fields in grounded glaciers: a simple algorithm for including deviatoric stress gradients