1. Agarwal, N., Bullins, B., Hazan, E.: Second-order stochastic optimization for machine learning in linear time. J. Mach. Learn. Res. 18(116), 1–40 (2017)
2. Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training Resnet-50 on ImageNet in 15 minutes (2017). http://arxiv.org/abs/1711.04325
3. Allen-Zhu, Z.: Katyusha: The first direct acceleration of stochastic gradient methods. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1200–1205 (2017)
4. Allen-Zhu, Z., Hazan, E.: Variance reduction for faster non-convex optimization. In: Proceedings of the 33rd International Conference on Machine Learning, 699–707 (2016)
5. Andrew, G., Gao, J.: Scalable training of $$\ell _1$$-regularized log-linear models. In: Proceedings of the 24th International Conference on Machine Learning, 33–40 (2007)