1. Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)
2. Le, Q., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A.: On optimization methods for deep learning. In: International Conference on Machine Learning. pp. 265–272. ACM, New York, USA (2011)
3. Bottou, L.: Stochastic gradient learning in neural networks. In: Proceedings of Neuro-Nımes. 91 (1991)
4. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
5. Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: Neural Information Processing Systems. vol. 20. Curran Associates, Inc. (2007). https://proceedings.neurips.cc/paper/2007/file/0d3180d672e08b4c5312dcdafdf6ef36-Paper.pdf