On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization-Reference-Cited by-同舟云学术

On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization

Published:2018-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence
language:
Short-container-title:

Author:

Zhou Fan¹,Cong Guojing²

Affiliation:

1. Georgia Institute of Technology

2. IBM Thomas J. Watson Research Center

Abstract

We adopt and analyze a synchronous K-step averaging stochastic gradient descent algorithm which we call K-AVG for solving large scale machine learning problems. We establish the convergence results of K-AVG for nonconvex objectives. Our analysis of K-AVG applies to many existing variants of synchronous SGD. We explain why the K-step delay is necessary and leads to better performance than traditional parallel stochastic gradient descent which is equivalent to K-AVG with $K=1$. We also show that K-AVG scales better with the number of learners than asynchronous stochastic gradient descent (ASGD). Another advantage of K-AVG over ASGD is that it allows larger stepsizes and facilitates faster convergence. On a cluster of $128$ GPUs, K-AVG is faster than ASGD implementations and achieves better accuracies and faster convergence for training with the CIFAR-10 dataset.

Publisher

International Joint Conferences on Artificial Intelligence Organization

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Robust Softmax Aggregation on Blockchain based Federated Learning with Convergence Guarantee;2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS);2024-07-29

2. Over-the-Air Federated Learning and Optimization;IEEE Internet of Things Journal;2024-05-15

3. Functional link hybrid artificial neural network for predicting continuous biohydrogen production in dynamic membrane bioreactor;Bioresource Technology;2024-04

4. Differentially Private Federated Learning With an Adaptive Noise Mechanism;IEEE Transactions on Information Forensics and Security;2024

5. Communication Optimization Algorithms for Distributed Deep Learning Systems: A Survey;IEEE Transactions on Parallel and Distributed Systems;2023-12