1. Allen-Zhu, Z.: Natasha 2: Faster non-convex optimization than SGD. In: Advances in Neural Information Processing Systems, pp. 2675–2686 (2018)
2. Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: International Conference on Machine Learning, pp. 699–707 (2016)
3. Arjevani, Y., Carmon, Y., Duchi, J.C., Foster, D.J., Srebro, N., Woodworth, B.: Lower bounds for non-convex stochastic optimization. arXiv:1912.02365 (2019)
4. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–27 (2011)
5. Chen, X., Liu, S., Sun, R., Hong, M.: On the convergence of a class of adam-type algorithms for non-convex optimization. In: International Conference on Learning Representations (2018)