1. Saurabh Agarwal Hongyi Wang Shivaram Venkataraman and Dimitris Papailiopoulos. 2022. On the Utility of Gradient Compression in Distributed Training Systems. In MLSys. Saurabh Agarwal Hongyi Wang Shivaram Venkataraman and Dimitris Papailiopoulos. 2022. On the Utility of Gradient Compression in Distributed Training Systems. In MLSys.
2. Takuya Akiba Shuji Suzuki and Keisuke Fukuda. 2017. Extremely Large Minibatch SGD: Training Resnet-50 on Imagenet in 15 Minutes. arXiv preprint arXiv:1711.04325. Takuya Akiba Shuji Suzuki and Keisuke Fukuda. 2017. Extremely Large Minibatch SGD: Training Resnet-50 on Imagenet in 15 Minutes. arXiv preprint arXiv:1711.04325.
3. Aida Amini Saadia Gabriel Shanchuan Lin Rik Koncel-Kedziorski Yejin Choi and Hannaneh Hajishirzi. 2019. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms. In NAACL. 2357–2367. Aida Amini Saadia Gabriel Shanchuan Lin Rik Koncel-Kedziorski Yejin Choi and Hannaneh Hajishirzi. 2019. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms. In NAACL. 2357–2367.
4. Debraj Basu , Deepesh Data , Can Karakus, and Suhas Diggavi. 2019 . Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations. In NeurIPS. Debraj Basu, Deepesh Data, Can Karakus, and Suhas Diggavi. 2019. Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations. In NeurIPS.
5. Jeremy Bernstein Yu-Xiang Wang Kamyar Azizzadenesheli and Anima Anandkumar. 2018. signSGD: compressed optimisation for non-convex problems. In ICML. Jeremy Bernstein Yu-Xiang Wang Kamyar Azizzadenesheli and Anima Anandkumar. 2018. signSGD: compressed optimisation for non-convex problems. In ICML.