1. A method for stochastic optimization;kingma;ArXiv Preprint,2017
2. On variance reduction in stochastic gradient descent and its asynchronous variants;reddi;Advances in neural information processing systems,2015
3. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude;tieleman;COURSERA Neural Networks for Machine Learning,2012
4. Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM;tong;ArXiv,2019
5. Wind Speed Forecasting Using the Stationary Wavelet Transform and Quaternion Adaptive-Gradient Methods