1. Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., et al.: Tensorflow: a system for large-scale machine learning. Proc. USENIX OSDI 2016, 265–283 (2016)
2. Berral JL, Wang C, Youssef A (2020) AI4DL: mining behaviors of deep learning workloads for resource management. In: Proceedings of USENIX HotCloud (2020)
3. Chen, C., Wang, W., Li, B.: Round-Robin synchronization: mitigating communication Bottlenecks in parameter servers. Proc IEEE INFOCOM 2019, 532–540 (2019)
4. Chen, T., Li, M., Li, Y., Lin, M., Wang, N., Wang, M., Xiao, T., Xu, B., Zhang, C., Zhang, Z.: Mxnet: a flexible and efficient machine learning library for heterogeneous distributed systems (2015). arXiv preprint arXiv:151201274
5. Gu, J., Chowdhury, M., Shin, K.G., Zhu, Y., Jeon, M., Qian, J., Liu, H., Guo, C.: Tiresias: a GPU cluster manager for distributed deep learning. Proc. USENIX NSDI 2019, 485–500 (2019)