1. Allen-Zhu, Z.: Natasha 2: faster non-convex optimization than SGD. (2017) arXiv preprint arXiv:1708.08694
2. Bernstein, J., Azizzadenesheli, K., Wang, Y.-X., Anandkumar, A.: Convergence rate of sign stochastic gradient descent for non-convex functions. In: International Conference on Machine Learning, pp. 560–569. PMLR (2018)
3. Bassily, R., Belkin, M., Ma, S.: On exponential convergence of sgd in non-convex over-parametrized learning. (2018) arXiv preprint arXiv:1811.02564
4. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
5. Bach, F., Moulines, E.: Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n). (2013) arXiv preprint arXiv:1306.2119