1. Alistarh, D., Grubic, D., Li, J., Tomioka, R., Vojnovic, M.: QSGD: communication-efficient SGD via gradient quantization and encoding. In: Advances in Neural Information Processing Systems, pp. 1709–1720 (2017)
2. Alistarh, D., Hoefler, T., Johansson, M., Khirirat, S., Konstantinov, N., Renggli, C.: The convergence of sparsified gradient methods (2018)
3. Arora, S., Cohen, N., Hazan, E.: On the optimization of deep networks: implicit acceleration by overparameterization. In: International Conference on Machine Learning, pp. 244–253. PMLR (2018)
4. Beznosikov, A., Gasnikov, A.: Compression and data similarity: combination of two techniques for communication-efficient solving of distributed variational inequalities. arXiv preprint arXiv:2206.09446 (2022)
5. Beznosikov, A., Gasnikov, A.: Similarity, compression and local steps: three pillars of efficient communications for distributed variational inequalities. arXiv preprint arXiv:2302.07615 (2023)