Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols

Author:

Bosilca George,Bouteiller Aurelien,Herault Thomas,Lemarinier Pierre,Dongarra Jack J.

Publisher

Springer Berlin Heidelberg

Reference16 articles.

1. Meuer, W.H.: The top500 project: Looking back over 15 years of supercomputing experience. Informatik-Spektrum 31(3), 203–222 (2008)

2. The MPI Forum: MPI: a message passing interface. In: Supercomputing 1993: Proceedings of the 1993 ACM/IEEE conference on Supercomputing, pp. 878–883. ACM Press, New York (1993)

3. Fagg, G.E., Gabriel, E., Bosilca, G., Angskun, T., Chen, Z., Pjesivac-Grbovic, J., London, K., Dongarra, J.J.: Extending the MPI specification for process fault tolerance on high performance computing systems. In: Proceedings of the International Supercomputer Conference (ICS) 2004, Primeur (2004)

4. Lemarinier, P., Bouteiller, A., Herault, T., Krawezik, G., Cappello, F.: Improved message logging versus improved coordinated checkpointing for fault tolerant MPI. In: IEEE International Conference on Cluster Computing (Cluster 2004). IEEE CS Press, Los Alamitos (2004)

5. Bouteiller, A., Ropars, T., Bosilca, G., Morin, C., Dongarra, J.: Reasons to be pessimist or optimist for failure recovery in high performance clusters. In: IEEE (ed.): Proceedings of the 2009 IEEE Cluster Conference, New Orleans, Louisiana, USA (2009)

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Local rollback for resilient MPI applications with application-level checkpointing and message logging;Future Generation Computer Systems;2019-02

2. Fault-Tolerant MPI;Computer Communications and Networks;2015

3. Correlated set coordination in fault tolerant message logging protocols for many-core clusters;Concurrency and Computation: Practice and Experience;2012-07-12

4. HydEE: Failure Containment without Event Logging for Large Scale Send-Deterministic MPI Applications;2012 IEEE 26th International Parallel and Distributed Processing Symposium;2012-05

5. Correlated Set Coordination in Fault Tolerant Message Logging Protocols;Euro-Par 2011 Parallel Processing;2011

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3