1. Abadi , M. , Agarwal , A. , Barham , P. , Brevdo , E. , Chen , Z. , Citro , C. , Corrado , G. S. , Davis , A. , Dean , J. , Devin , M. , Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 ( 2016 ). Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016).
2. Agarwal , A. , and Duchi , J. C . Distributed delayed stochastic optimization . In Advances in Neural Information Processing Systems ( 2011 ), pp. 873 -- 881 . Agarwal, A., and Duchi, J. C. Distributed delayed stochastic optimization. In Advances in Neural Information Processing Systems (2011), pp. 873--881.
3. Azizan , N. , Lale , S. , and Hassibi , B . Stochastic mirror descent on overparameterized nonlinear models: Convergence, implicit regularization, and generalization. arXiv:1906.03830 ( 2019 ). Azizan, N., Lale, S., and Hassibi, B. Stochastic mirror descent on overparameterized nonlinear models: Convergence, implicit regularization, and generalization. arXiv:1906.03830 (2019).
4. Bousquet , O. , and Elisseeff , A . Stability and generalization. Journal of machine learning research 2 , Mar ( 2002 ), 499--526. Bousquet, O., and Elisseeff, A. Stability and generalization. Journal of machine learning research 2, Mar (2002), 499--526.
5. Defazio , A. , Bach , F. , and Lacoste-Julien , S. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives . In Advances in neural information processing systems ( 2014 ), pp. 1646 -- 1654 . Defazio, A., Bach, F., and Lacoste-Julien, S. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in neural information processing systems (2014), pp. 1646--1654.