Author:
Reitz Lukas,Fohry Claudia
Publisher
Springer Nature Switzerland
Reference31 articles.
1. Fohry, C.: Checkpointing and localized recovery for nested fork-join programs. In: International Symposium on Checkpointing for Supercomputing (SuperCheck) (2021). https://arxiv.org/abs/2102.12941
2. Laboratory, O.R.N.: Frontier. https://www.olcf.ornl.gov/frontier
3. Herault, T., Robert, Y.: Fault-Tolerance Techniques for High-Performance Computing. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20943-2
4. Benoit, A., Herault, T., Fèvre, V.L., Robert, Y.: Replication is more efficient than you think. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1–14. ACM (2019)
5. Losada, N., González, P., Martìn, M.J., Bosilca, G., Bouteiller, A., Teranishi, K.: Fault tolerance of MPI applications in exascale systems: the ULFM solution. Future Generation Comput. Syst. (FGCS) 106, 467–481 (2020)