1. Combining backward and forward recovery to cope with silent errors in iterative solvers;Fasi,2015
2. Design, modeling, and evaluation of a scalable multi-level checkpointing system;Moody,2010
3. Fault tolerance in petascale/exascale systems: current knowledge, challenges and research opportunities;Cappello;Int. J. High Perform. Comput. Appl.,2009
4. Toward exascale resilience;Cappello;Int. J. High Perform. Comput. Appl.,2009
5. Toward exascale resilience: 2014 update;Cappello;Supercomputing frontiers and innovations,2014