1. Roux NL, Schmidt M, Bach FR (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv Neural Inf Process Syst 2012:2663–2671
2. Reddi SJ, Hefny A, Sra S, Poczos B, Smola AJ (2015) On variance reduction in stochastic gradient descent and its asynchronous variants. Adv Neural Inf Process Syst 2015:2647–2655
3. Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math Program 162(1–2):83–112
4. Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: a fast incremental gradient method with support for non-strongly convex composite objectives. Adv Neural Inf Process Syst 2014:1646–1654
5. De S, Goldstein T (2016) Efficient distributed SGD with variance reduction. In: 2016 IEEE 16th international conference on data mining (ICDM), 2016. pp 111–120