1. 2021. NVIDIA NCCL. https://developer.nvidia.com/NCCL. 2021. NVIDIA NCCL. https://developer.nvidia.com/NCCL.
2. 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22) . USENIX Association, Renton, WA. https://www.usenix.org/conference/nsdi22/presentation/weng 2022. MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22). USENIX Association, Renton, WA. https://www.usenix.org/conference/nsdi22/presentation/weng
3. Martín Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , Manjunath Kudlur , Josh Levenberg , Rajat Monga , Sherry Moore , Derek G. Murray , Benoit Steiner , Paul Tucker , Vijay Vasudevan , Pete Warden , Martin Wicke , Yuan Yu , and Xiaoqiang Zheng . 2016 . Tensorflow: A system for large-scale machine learning . In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265--283 . Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek G. Murray, Benoit Steiner, Paul Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. Tensorflow: A system for large-scale machine learning. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). 265--283.
4. Saurabh Agarwal , Hongyi Wang , Shivaram Venkataraman , and Dimitris Papailiopoulos . 2021. On the Utility of Gradient Compression in Distributed Training Systems. arXiv preprint arXiv:2103.00543 ( 2021 ). Saurabh Agarwal, Hongyi Wang, Shivaram Venkataraman, and Dimitris Papailiopoulos. 2021. On the Utility of Gradient Compression in Distributed Training Systems. arXiv preprint arXiv:2103.00543 (2021).
5. Alham Fikri Aji and Kenneth Heafield. 2017. Sparse communication for distributed gradient descent. Alham Fikri Aji and Kenneth Heafield. 2017. Sparse communication for distributed gradient descent.