1. Alistarh D, Grubic D, Li J, Tomioka R, Vojnovic M (2017) QSGD: communication-efficient SGD via gradient quantization and encoding. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 1709–1720
2. Ben-Nun T, Hoefler T (2018) Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. arXiv:1802.09941
3. Byrd J, Lipton Z (2019) What is the effect of importance weighting in deep learning? In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference Machine Learning, P. Machine Learning Research, vol. 97. PMLR, pp 872–881
4. Chang HS, Learned-Miller EG, McCallum A (2017) Active bias: training more accurate neural networks by emphasizing high variance samples. In: NIPS
5. Chen C, Weng Q, Wang W, Li B, Li B (2020) Semi-dynamic load balancing. In: Proceedings of the 11th ACM symposium on cloud computing. https://doi.org/10.1145/3419111.3421299