1. Second-order stochastic optimization for machine learning in linear time;Agarwal;Journal of Machine Learning Research,2017
2. Control variates for stochastic gradient MCMC;Baker;Statistics and Computing,2019
3. Two-point step size gradient methods;Barzilai;IMA Journal of Numerical Analysis,1988
4. Baydin, A. G., Cornish, R., Rubio, D. M., Schmidt, M. W., & Wood, F. D. (2018). Online learning rate adaptation with hypergradient descent. In International conference on learning representations.
5. SGD-QN: Careful quasi-Newton stochastic gradient descent;Bordes;Journal of Machine Learning Research,2009