1. Super-scalable algorithms for computing on 100,000 processors;Engelmann,2005
2. Recovery patterns for iterative methods in a parallel unstable environment;Bosilca;SIAM Journal on Scientific Computing (SISC),2007
3. Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources;Chen,2006
4. Fault tolerant algorithms for heat transfer problems;Ltaief;Journal of Parallel and Distributed Computing (JPDC),2008
5. BigSim: a parallel simulator for performance prediction of extremely large parallel machines;Zheng,2004