1. Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification;jain;J Mach Learn Res,2017
2. A Markov chain theory approach to characterizing the minimax optimality of stochastic gradient descent (for least squares;jain;Proc Conf Found Softw Technol Theor Comput Sci,0
3. Non-strongly-convex smooth stochastic approximation with convergence rate $O(1/n)$;bach;Proc Adv Neural Inf Process Syst,0
4. Breakdown Points of Affine Equivariant Estimators of Multivariate Location and Covariance Matrices
5. Federated residual learning;agarwal;CoRR,2020