Author:
Sultana Nawrin,Rüfenacht Martin,Skjellum Anthony,Laguna Ignacio,Mohror Kathryn
Funder
Lawrence Livermore National Laboratory
National Science Foundation
Subject
Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Hardware and Architecture,Theoretical Computer Science,Software
Reference32 articles.
1. Toward exascale resilience;Cappello;Int. J. High Perform. Comput. Appl.,2009
2. Fault tolerance in petascale/exascale systems: current knowledge, challenges and research opportunities;Cappello;Int. J. High Perform. Comput. Appl.,2009
3. Post-failure recovery of MPI communication capability: design and rationale;Bland;Int. J. High Perform. Comput. Appl.,2013
4. Evaluating and extending user-level fault tolerance in MPI applications;Laguna;Int. J. High Perform. Comput. Appl.,2016
5. Exploring automatic, online failure recovery for scientific applications at extreme scales;Gamell,2014
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献