1. Weng, Q., Xiao, W., Yu, Y., Wang, W., Wang, C., He, J., Li, Y., Zhang, L., Lin, W., and Ding, Y. (2022, January 4–6). MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters. Proceedings of the 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), USENIX Association, Renton, WA, USA.
2. Jeon, M., Venkataraman, S., Phanishayee, A., Qian, J., Xiao, W., and Yang, F. (2019, January 10–12). Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads. Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19), Renton, WA, USA.
3. Hazelwood, K., Bird, S., Brooks, D., Chintala, S., Diril, U., Dzhulgakov, D., Fawzy, M., Jia, B., Jia, Y., and Kalro, A. (2018, January 24–28). Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective. Proceedings of the 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA), IEEE, Vienna, Austria.
4. Mohan, J., Phanishayee, A., Kulkarni, J., and Chidambaram, V. (2022, January 11–13). Looking Beyond GPUs for DNN Scheduling on Multi-Tenant Clusters. Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), USENIX Association, Carlsbad, CA, USA.
5. Zhao, H., Han, Z., Yang, Z., Zhang, Q., Yang, F., Zhou, L., Yang, M., Lau, F.C., Wang, Y., and Xiong, Y. (2020, January 4–6). HiveD: Sharing a GPU Cluster for Deep Learning with Guarantees. Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), Online.