A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification-Reference-Cited by-同舟云学术

A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification

Published:2019-08 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Shi Shaohuai¹,Zhao Kaiyong¹,Wang Qiang¹,Tang Zhenheng¹,Chu Xiaowen¹

Affiliation:

1. Department of Computer Science, Hong Kong Baptist University

Abstract

Gradient sparsification is a promising technique to significantly reduce the communication overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms. Yet, many existing gradient sparsification schemes (e.g., Top-k sparsification) have a communication complexity of O(kP), where k is the number of selected gradients by each worker and P is the number of workers. Recently, the gTop-k sparsification scheme has been proposed to reduce the communication complexity from O(kP) to O(k logP), which significantly boosts the system scalability. However, it remains unclear whether the gTop-k sparsification scheme can converge in theory. In this paper, we first provide theoretical proofs on the convergence of the gTop-k scheme for non-convex objective functions under certain analytic assumptions. We then derive the convergence rate of gTop-k S-SGD, which is at the same order as the vanilla mini-batch SGD. Finally, we conduct extensive experiments on different machine learning models and data sets to verify the soundness of the assumptions and theoretical results, and discuss the impact of the compression ratio on the convergence performance.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GSASG: Global Sparsification With Adaptive Aggregated Stochastic Gradients for Communication-Efficient Federated Learning;IEEE Internet of Things Journal;2024-09-01

2. Information-Theoretically Private Federated Submodel Learning With Storage Constrained Databases;IEEE Transactions on Information Theory;2024-08

3. AutoDDL: Automatic Distributed Deep Learning With Near-Optimal Bandwidth Cost;IEEE Transactions on Parallel and Distributed Systems;2024-08

4. Permutation mask: a combined gradient sparsification for federated learning;Journal of Nonparametric Statistics;2024-07-22

5. SparDL: Distributed Deep Learning Training with Efficient Sparse Communication;2024 IEEE 40th International Conference on Data Engineering (ICDE);2024-05-13