1. Robust and Communication-Efficient Federated Learning From Non-i.i.d. Data
2. ADADELTA: An adaptive learning rate method;zeiler;arXiv 1212 5701,2012
3. Deep gradient compression: Reducing the communication bandwidth for distributed training;lin;Proc Int Conf Learn Represent,2018
4. On the unreasonable effectiveness of federated averaging with heterogeneous data;wang;arXiv 2206 04723,2022
5. QSGD: Communication-efficient SGD via gradient quantization and encoding;alistarh;Proc 31st Conf Neural Inf Process Syst (NIPS),2017