1. Agarwal, A., Duchi, J.C.: Distributed delayed stochastic optimization. In: Advances in Neural Information Processing Systems, pp. 873–881 (2011)
2. Alistarh, D., Chatterjee, B., Kungurtsev, V.: Elastic consistency: a general consistency model for distributed stochastic gradient descent. arXiv preprint arXiv:2001.05918 (2020)
3. Alistarh, D., De Sa, C., Konstantinov, N.: The convergence of stochastic gradient descent in asynchronous shared memory. In: ACM Symposium on Principles of Distributed Computing, PODC 2018, pp. 169–178. ACM, New York (2018). https://doi.org/10.1145/3212734.3212763
4. Alistarh, D., Grubic, D., Li, J., Tomioka, R., Vojnovic, M.: QSGD: communication-efficient SGD via gradient quantization and encoding. In: Advances in Neural Information Processing Systems, pp. 1709–1720 (2017)
5. Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485 (1967)