1. N. Agarwal, Z. Allen-Zhu, B. Bullins, E. Hazan, T. Ma, Finding approximate local minima faster than gradient descent, in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1195–1199 (2017)
2. K. Ahn, C. Yun, S. Sra, Sgd with shuffling: optimal rates without component convexity and large epoch requirements. Adv. Neural Inf. Process. Syst. 33 (2020)
3. A. Ajalloeian, S.U. Stich, Analysis of sgd with biased gradient estimators. Preprint (2020). arXiv:2008.00051
4. D. Alistarh, D. Grubic, J. Li, R. Tomioka, M. Vojnovic, Qsgd: Communication-efficient sgd via gradient quantization and encoding. Adv. Neural Inf. Process. Syst. 1709–1720 (2017)
5. Z. Allen-Zhu, Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter, in International Conference on Machine Learning, pp. 89–97 (2017)