1. Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60(2), 223–311 (2018)
2. Johnson, R. and Ahang, T.:Accelerating stochastic gradient descent using predictive variance reduction, In: Advances in Neural Information Processing Systems 26, pp. 315–323. Curran Associates, Inc. (2013)
3. Defazio, A., Bach, F., and Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In: Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pp. 1646–1654. Curran Associates, Inc. (2014)
4. Byrd, R.H., Hansen, S.L., Nocedal, J., Singer, Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optim. 26(2), 1008–1031 (2016)
5. Mokhtari, A., Ribeiro, A.: Res: Regularized stochastic BFGS algorithm. IEEE Trans. Signal Process. 62(23), 6089–6104 (2014)