1. QSGD: Communication-efficient SGD via gradient quantization and encoding;Alistarh,2017
2. M. Assran, N. Loizou, N. Ballas, M. Rabbat, Stochastic gradient push for distributed deep learning, in: Proceedings of the International Conference on Machine Learning, PMLR, 2019, pp. 344–353.
3. A. Beznosikov, S. Horváth, P. Richtárik, M. Safaryan, On biased compression for distributed learning, arXiv preprint arXiv:2002.12410(2020).
4. The spectral gap of sparse random digraphs;Coste,2021
5. Dual averaging for distributed optimization: convergence analysis and network scaling;Duchi;IEEE Trans. Autom. Control,2011