1. Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
2. George Amvrosiadis, Jun Woo Park, Gregory R Ganger, Garth A Gibson, Elisabeth Baseman, and Nathan DeBardeleben. Bigger, longer, fewer: what do cluster jobs look like outside google, 2017.
3. Eric Boutin, Jaliya Ekanayake, Wei Lin, Bing Shi, Jingren Zhou, Zhengping Qian, Ming Wu, and Lidong Zhou. Apollo: Scalable and coordinated scheduling for {Cloud-Scale} computing. In 11th USENIX symposium on operating systems design and implementation (OSDI 14), pages 285--300, 2014.
4. Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. arXiv, arXiv/2005.14165, 2020.
5. Balancing efficiency and fairness in heterogeneous GPU clusters for deep learning