1. P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R.S. Williams, K. Yelick, Exascale computing study: Technology challenges in achieving exascale systems, 2008.
2. J. Dongarra, P. Beckman, T. Moore, P. Aerts, G. Aloisio, D. Barkai, T. Boku, B. Chapman, X. Chi, A. Choudhary, S. Dosanjh, T. Dunning, R. Fiore, A. Geist, R. Harrison, M. Hereld, M. Heroux, K. Hotta, Y. Ishikawa, Z. Jin, F. Johnson, S. Kale, R. Kenway, D. Keyes, B. Kramer, J. Labarta, A. Lichnewsky, B. Lucas, S. Matsuoka, P. Messina, P. Michielse, B. Mohr, M. Mueller, J. Shalf, D. Skinner, M. Snir, T. Sterling, R. Stevens, F. Streitz, B. Sugar, A.V.D. Steen, J. Vetter, P. Williams, R. Wisniewski, K. Yelick, The international exascale software project roadmap 1.
3. G. Zheng, L. Shi, L.V. Kalé, FTC-Charm++: an in-memory checkpoint-based fault tolerant runtime for Charm++ and MPI, in: 2004 IEEE Cluster, San Diego, CA, 2004, pp. 93–103.
4. A. Moody, G. Bronevetsky, K. Mohror, B.R. de Supinski, Design, modeling, and evaluation of a scalable multi-level checkpointing system, in: SC, 2010, pp. 1–11.
5. E. Meneses, G. Bronevetsky, L.V. Kale, Evaluation of simple causal message logging for large-scale fault tolerant HPC systems, in: 16th IEEE Workshop on Dependable Parallel, Distributed and Network-Centric Systems in 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2011), 2011.