Efficient exascale discretizations: High-order finite element methods

Author:

Kolev Tzanio1ORCID,Fischer Paul234,Min Misun2ORCID,Dongarra Jack5,Brown Jed6,Dobrev Veselin1,Warburton Tim7,Tomov Stanimire5,Shephard Mark S8,Abdelfattah Ahmad5,Barra Valeria6ORCID,Beams Natalie5ORCID,Camier Jean-Sylvain1,Chalmers Noel9,Dudouit Yohann1ORCID,Karakus Ali10,Karlin Ian1,Kerkemeier Stefan2,Lan Yu-Hsiang2,Medina David11,Merzari Elia212,Obabko Aleksandr2,Pazner Will1,Rathnayake Thilina3,Smith Cameron W5ORCID,Spies Lukas3,Swirydowicz Kasia13,Thompson Jeremy6,Tomboulides Ananias214,Tomov Vladimir1

Affiliation:

1. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA

2. Mathematics and Computer Science, Argonne National Laboratory, Lemont, IL, USA

3. Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL, USA

4. Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA

5. Innovative Computing Laboratory, University of Tennessee, Knoxville, TN, USA

6. Department of Computer Science, University of Colorado, Boulder, CO, USA

7. Department of Mathematics, Virginia Tech, Blacksburg, VA, USA

8. Scientific Computation Research Center, Rensselaer Polytechnic Institute, Troy, NY, USA

9. AMD Research, Austin, TX, USA

10. Mechanical Engineering Department, Middle East Technical University, Ankara, Turkey

11. Occalytics LLC, Weehawken, NJ, USA

12. Department of Nuclear Engineering, Penn State, PA, USA

13. Pacific Northwest National Laboratory, WA, USA

14. Department of Mechanical Engineering, Aristotle University of Thessaloniki, Greece

Abstract

Efficient exploitation of exascale architectures requires rethinking of the numerical algorithms used in many large-scale applications. These architectures favor algorithms that expose ultra fine-grain parallelism and maximize the ratio of floating point operations to energy intensive data movement. One of the few viable approaches to achieve high efficiency in the area of PDE discretizations on unstructured grids is to use matrix-free/partially assembled high-order finite element methods, since these methods can increase the accuracy and/or lower the computational time due to reduced data motion. In this paper we provide an overview of the research and development activities in the Center for Efficient Exascale Discretizations (CEED), a co-design center in the Exascale Computing Project that is focused on the development of next-generation discretization software and algorithms to enable a wide range of finite element applications to run efficiently on future hardware. CEED is a research partnership involving more than 30 computational scientists from two US national labs and five universities, including members of the Nek5000, MFEM, MAGMA and PETSc projects. We discuss the CEED co-design activities based on targeted benchmarks, miniapps and discretization libraries and our work on performance optimizations for large-scale GPU architectures. We also provide a broad overview of research and development activities in areas such as unstructured adaptive mesh refinement algorithms, matrix-free linear solvers, high-order data visualization, and list examples of collaborations with several ECP and external applications.

Funder

U.S. Department of Energy

Publisher

SAGE Publications

Subject

Hardware and Architecture,Theoretical Computer Science,Software

Reference69 articles.

1. High-performance Tensor Contractions for GPUs

2. Performance, Design, and Autotuning of Batched GEMM for GPUs

3. Ameen M, Patel S, Colmenares J, et al. (2020) Direct Numerical Simulation (DNS) and high-fidelity large-eddy simulations for improved prediction of in-cylinder flow and combustion processes. Technical report, DOE Vehicle Technologies Office Annual Merit Review.

4. MFEM: A modular finite element methods library

5. Monotonicity in high‐order curvilinear finite element arbitrary Lagrangian–Eulerian remap

Cited by 35 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Acceleration of Tensor-Product Operations with Tensor Cores;ACM Transactions on Parallel Computing;2024-09-09

2. MAGMA: Enabling exascale performance with accelerated BLAS and LAPACK for diverse GPU architectures;The International Journal of High Performance Computing Applications;2024-06-20

3. High-performance finite elements with MFEM;The International Journal of High Performance Computing Applications;2024-06-14

4. Weak boundary conditions for Lagrangian shock hydrodynamics: A high-order finite element implementation on curved boundaries;Journal of Computational Physics;2024-06

5. Alya towards Exascale: Optimal OpenACC Performance of the Navier-Stokes Finite Element Assembly on GPUs;2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS);2024-05-27

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3