Author:
Coti Camille,Petrucci Laure,Torres González Daniel Alberto
Publisher
Springer International Publishing
Reference31 articles.
1. Lecture Notes in Computer Science;G Aupy,2014
2. Benacchio, T., et al.: Resilience and fault-tolerance in high-performance computing for numerical weather and climate prediction. Int. J. High Perform. Comput. Appl. (2020)
3. Benoît, A., Cavelan, A., Cappello, F., Raghavan, P., Robert, Y., Sun, H.: Coping with silent and fail-stop errors at scale by combining replication and checkpointing. J. Parallel Distrib. Comput. 122, 209–225 (2018)
4. Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J.J.: An evaluation of user-level failure mitigation support in MPI. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds.) Recent Advances in the Message Passing Interface, pp. 193–203. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-33518-1_24
5. Bosilca, G., et al.: Failure detection and propagation in HPC systems. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 312–322 (2016)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Rollback-Free Recovery for a High Performance Dense Linear Solver With Reduced Memory Footprint;IEEE Transactions on Parallel and Distributed Systems;2024-07
2. A Formal Model for Fault Tolerant Parallel Matrix Factorization;2022 26th International Conference on Engineering of Complex Computer Systems (ICECCS);2022-03