1. Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
2. SchedTune: A Heterogeneity-Aware GPU Scheduler for Deep Learning
3. Varuna
4. Bommasani , R. , Hudson , D. A. , Adeli , E. , Altman , R. , Arora , S. , von Arx , S. , Bernstein , M. S. , Bohg , J. , Bosselut , A. , Brunskill , E. , On the opportunities and risks of foundation models. arXiv:2108.07258 ( 2021 ). Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., Brunskill, E., et al. On the opportunities and risks of foundation models. arXiv:2108.07258 (2021).
5. Brown , T. , Mann , B. , Ryder , N. , Subbiah , M. , Kaplan , J. D. , Dhariwal , P. , Neelakantan , A. , Shyam , P. , Sastry , G. , Askell , A. , Language models are few-shot learners. NeurIPS ( 2020 ). Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. NeurIPS (2020).