ECP libraries and tools: An overview

Author:

Heroux Michael A1ORCID,McInnes Lois Curfman2,Ahrens James3,Gamblin Todd4,Germann Timothy C3,Li Xiaoye Sherry5ORCID,Mohror Kathryn4,Munson Todd2,Shende Sameer6,Thakur Rajeev2ORCID,Vetter Jeffrey7ORCID,Willenbring James1

Affiliation:

1. Center for Computing Research, Sandia National Laboratories, Albuquerque, NM, USA

2. Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA

3. Physics and Chemistry of Material Group, Los Alamos National Laboratory, Los Alamos, NM, USA

4. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory, Livermore, CA, USA

5. Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA

6. Performance Research Laboratory, University of Oregon, Eugene, OR, USA

7. Advanced Computing Systems Research, Oak Ridge National Laboratory, Oak Ridge, TN, USA

Abstract

The Exascale Computing Project (ECP) Software Technology and Co-Design teams addressed the growing complexities in high-performance computing (HPC) by developing scalable software libraries and tools that leverage exascale system capabilities. As we enter the exascale era, the need for reusable, optimized software solutions that can handle the unique challenges posed by these systems becomes increasingly important. The primary challenges the ECP teams faced were to create software libraries and tools that are performant on exascale architectures and portable and usable across diverse hardware platforms. Efforts addressed issues related to concurrent execution, memory management, and the integration of heterogeneous computing resources, such as GPUs from multiple vendors. The ECP’s strategy involved a structured development process encompassing the creation, optimization, and deployment of software in collaboration with industry, academia, and national laboratories. The project was organized into several technical areas: co-design of domain-specific suites with target applications, programming models and runtimes, development tools, mathematical libraries, data and visualization tools, and software ecosystem and delivery mechanisms. ECP has successfully developed a large portfolio of software libraries and tools that demonstrate significant improvements in performance and scalability on exascale systems. These products have been integrated into the Department of Energy’s computing facilities, supporting various scientific applications and ensuring robust performance across different hardware setups. ECP advancements in software development for exascale computing highlight the importance of a collaborative and adaptive approach to handling next-generation HPC systems complexities. The lessons learned emphasize the need for continuous engagement with end-users and vendors, and the importance of maintaining a balance between innovation and practical implementation. Future efforts will focus on ensuring scalability, keeping pace with rapid hardware advancements, and further enhancing the interoperability and usability of the software ecosystem. Subsequent articles in this special issue provide in-depth discussions and case studies into specific library and tool efforts.

Funder

Exascale Computing Project

Publisher

SAGE Publications

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3