A Case Study of Application Structure Aware Resilience Through Differentiated State Saving and Recovery
Author:
Dubey Anshu,Fujita Hajime,Rubenstein Zachary,Van Straalen Brian,Chien Andrew A.
Publisher
Springer International Publishing
Reference16 articles.
1. Berrocal, E., Bautista-Gomez, L., Di, S., Lan, Z., Cappello, F.: Lightweight silent data corruption detection based on runtime data analysis for HPC applications. Technical report (2014) 2. Chung, J., Lee, I., Sullivan, M., Ryoo, J.H., Kim, D.W., Yoon, D.H., Kaplan, L., Erez, M.: Containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems. In: The Proceedings of SC12 (2012) 3. Colella, P., Graves, D., Keen, N., Ligocki, T., Martin, D., McCorquodale, P., Modiano, D., Schwartz, P., Sternberg, T., Van Straalen, B.: Chombo software package for AMR applications design document. Technical report, LBNL, Applied Numerical Algorithms Group, Computational Research Division (2009) 4. Dubey, A., Antypas, K., Ganapathy, M., Reid, L., Riley, K., Sheeler, D., Siegel, A., Weide, K.: Extensible component-based architecture for FLASH, a massively parallel, multiphysics simulation code. Parallel Comput. 35(10–11), 512–522 (2009) 5. Dubey, A., Reid, L., Fisher, R.: Introduction to FLASH 3.0, with application to supersonic turbulence. In: Physica Scripta T132, : Topical Issue on Turbulent Mixing and Beyond, Results of a Conference at ICTP. Trieste, Italy, August (2008)
|
|