1. Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning;Zhang,2017
2. Deep gradient compression: Reducing the communication bandwidth for distributed training;Lin;arXiv preprint,2017
3. Optimal distributed online prediction using mini-batches;Dekel;Journal of Machine Learning Research,2012
4. Better mini-batch algorithms via accelerated gradient methods;Cotter;Advances in Neural Information Processing Systems,2011
5. Local SGD converges fast and communicates little;Stich;arXiv preprint,2018