1. Bishop, C.M. (2006). Pattern Recognition and Machine Learning, Springer.
2. Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, MIT Press.
3. Optimization Methods for Large-Scale Machine Learning;Bottou;SIAM Rev.,2018
4. Kingma, D.P., and Ba, J. (2015, January 7–9). Adam: A Method for Stochastic Optimization. Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA.
5. Minimizing Finite Sums with the Stochastic Average Gradient;Schmidt;Math. Program.,2017