1. Probability and measure;Billingsley,1979
2. On the convergence of a class of ADAM-type algorithms for non-convex optimization;Chen,2018
3. Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration;De,2018
4. Adaptive subgradient methods for online learning and stochastic optimization;Duchi;Journal of Machine Learning Research,2011
5. Deep learning;Goodfellow,2016