Author:
Bland Wesley,Du Peng,Bouteiller Aurelien,Herault Thomas,Bosilca George,Dongarra Jack
Publisher
Springer Berlin Heidelberg
Reference17 articles.
1. Cappello, F., Casanova, H., Robert, Y.: Preventive migration vs. preventive checkpointing for extreme scale supercomputers. PPL 21(2), 111–132 (2011)
2. Cappello, F., Geist, A., Gropp, B., Kalé, L.V., Kramer, B., Snir, M.: Toward exascale resilience. IJHPCA 23(4), 374–388 (2009)
3. Chen, Z., Fagg, G.E., Gabriel, E., Langou, J., Angskun, T., Bosilca, G., Dongarra, J.: Fault tolerant high performance computing by a coding approach. In: Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2005, pp. 213–223. ACM, New York (2005)
4. Daly, J.T.: A higher order estimate of the optimum checkpoint interval for restart dumps. Future Gener. Comput. Syst. 22, 303–312 (2006)
5. Davies, T., Karlsson, C., Liu, H., Ding, C., Chen, Z.: High Performance Linpack Benchmark: A Fault Tolerant Implementation without Checkpointing. In: Proceedings of the 25th ACM International Conference on Supercomputing (ICS 2011). ACM (2011)
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. An Algorithm-Based Fault Tolerance Strategy for the Bitonic Sort Parallel Algorithm;2021 10th Latin-American Symposium on Dependable Computing (LADC);2021-11
2. The Landscape of Exascale Research;ACM Computing Surveys;2021-03-31
3. Serverless linear algebra;Proceedings of the 11th ACM Symposium on Cloud Computing;2020-10-12
4. An Efficient In-Memory Checkpoint Method and its Practice on Fault-Tolerant HPL;IEEE Transactions on Parallel and Distributed Systems;2018-04-01
5. Self-Checkpoint;Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming;2017-01-26