1. Accelerating stochastic gradient descent using predictive variance reduction;johnson;Proc Neural Information Processing Systems,2013
2. A stochastic gradient method with an exponential convergence rate for finite training sets;le roux;Proc Neural Information Processing Systems,2012
3. Stochastic dual coordinate ascent methods for regularized loss minimization;shalev-shwartz;Journal of Machine Learning Research,2013
4. Adaptive subgradient methods for online learning and stochastic optimization;duchi;Journal of Machine Learning Research,2011
5. Adam: A method for stochastic optimization;kingma;Computer Science,2014