1. APEX workflows (version 1);LANL,2015
2. APEX workflows (version 2);LANL,2016
3. M. Gamell, D.S. Katz, H. Kolla, J. Chen, S. Klasky, M. Parashar, Exploring Automatic, Online Failure Recovery for Scientific Applications at Extreme Scales, in: SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2014, pp. 895–906.
4. A. Eisenman, K.K. Matam, S. Ingram, D. Mudigere, R. Krishnamoorthi, K. Nair, M. Smelyanskiy, M. Annavaram, {Check-N-Run}: A checkpointing system for training deep learning recommendation models, in: 19th USENIX Symposium on Networked Systems Design and Implementation, NSDI 22, 2022, pp. 929–943.
5. Use cases of lossy compression for floating-point data in scientific data sets;Cappello;Int. J. High Perform. Comput. Appl.,2019