1. Cauchy, A.: Methode generale pour la resolution des systemes d’equations simultanees. C.R. Acad. Sci. Paris 25, 536–538 (1847)
2. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence $$o(1/k^2)$$ (1983)
3. Sutton, R.S.: Two problems with backpropagation and other steepest-descent learning procedures for networks. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Erlbaum, Hillsdale (1986)
4. Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. In: J. Mach. Learn. Res. (2011)
5. McMahan, H.B., Streeter, M.J.: Delay-tolerant algorithms for asynchronous distributed online learning. In: NIPS (2014)