1. Belady, L.A.: A study of replacement algorithms for virtual-storage computer. IBM Syst. J. 5(2), 78–101 (1966)
2. Bian, Z., et al.: Colossal-AI: a unified deep learning system for large-scale parallel training. CoRR abs/2110.14883 (2021)
3. Bian, Z., Xu, Q., Wang, B., You, Y.: Maximizing parallelism in distributed training for huge neural networks. CoRR abs/2105.14450 (2021)
4. Brown, T.B., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems, NeurIPS (2020)
5. Chen, T., Xu, B., Zhang, C., Guestrin, C.: Training deep nets with sublinear memory cost. CoRR abs/1604.06174 (2016)