Affiliation:
1. SANDIA NATIONAL LABORATORIES, ALBUQUERQUE, NM 87185,
USA,
Abstract
Preparing applications for a transition from petascale to exascale systems will require a very large investment in several areas of software research and development. The introduction of manycore nodes, the abundance of parallelism, an increase in system faults (including soft errors) and a complicated, multi-component software environment are some of the most challenging issues we face. In this paper we address four topics we believe to be the most the challenging issues and therefore the greatest opportunities for making effective next-generation scalable applications. First and foremost is the need to transform existing applications to run on manycore platforms and properly design new applications. This is particularly challenging in the absence of a standard, portable manycore programming environment, but we can make progress in this direction while manycore programming models are developed. Second is promoting advanced modeling and simulation capabilities such as embedded optimization and uncertainty quantification that lead to higher quality results and orders of magnitude more parallelism. Third is progress toward fault resilience in applications, a critical need as system reliability degrades. Fourth and finally is a qualitative improvement in software design, including the social aspects, as exascale software systems will be increasingly multi-team and multi-faceted efforts.
Subject
Hardware and Architecture,Theoretical Computer Science,Software
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Neuro-Symbolic Approach to Certified Scientific Software Synthesis;Proceedings of the 1st ACM International Conference on AI-Powered Software;2024-07-10
2. iCheck: Leveraging RDMA and Malleability for Application-Level Checkpointing in HPC Systems;2022 IEEE 28th International Conference on Parallel and Distributed Systems (ICPADS);2023-01
3. Recovery of Distributed Iterative Solvers for Linear Systems Using Non-Volatile RAM;2022 IEEE/ACM 12th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS);2022-11
4. Fault Tolerant Lanczos Eigensolver via an Invariant Checking Method;Journal of Electronic Testing;2021-04-30
5. Zonal Flow Solver (ZFS): a highly efficient multi-physics simulation framework;International Journal of Computational Fluid Dynamics;2020-03-18