1. A theory of adaptive pattern classifiers;Amari;IEEE Trans. Electron. Comput.,1967
2. Stochastic gradient learning in neural networks;Rakhlin;Proc Neuro-Nímes,1991
3. Logarithmic regret algorithms for online convex optimization;Hazan;Mach. Learn.,2007
4. S. Lacoste-Julien, M. Schmidt, F. Bach, A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method, arXiv preprint arXiv:1212.2002.
5. Making gradient descent optimal for strongly convex stochastic optimization;Rakhlin,2012