Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization-Reference-Cited by-同舟云学术

Asynchronous Mini-Batch Gradient Descent with Variance Reduction for Non-Convex Optimization

Published:2017-02-13 Issue:1 Volume:31 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Huo Zhouyuan,Huang Heng

Abstract

We provide the first theoretical analysis on the convergence rate of asynchronous mini-batch gradient descent with variance reduction (AsySVRG) for non-convex optimization. Asynchronous stochastic gradient descent (AsySGD) has been broadly used for deep learning optimization, and it is proved to converge with rate of O(1/\sqrt{T}) for non-convex optimization. Recently, variance reduction technique is proposed and it is proved to be able to accelerate the convergence of SGD greatly. It is shown that asynchronous SGD method with variance reduction technique has linear convergence rate when problem is strongly convex. However, there is still no work to analyze the convergence rate of this method for non-convex problem. In this paper, we consider two asynchronous parallel implementations of mini-batch gradient descent method with variance reduction: one is on distributed-memory architecture and the other is on shared-memory architecture. We prove that both methods can converge with a rate of O(1/T) for non-convex optimization, and linear speedup is accessible when we increase the number of workers. We evaluate our methods by optimizing multi-layer neural networks on two real datasets (MNIST and CIFAR-10), and experimental results demonstrate our theoretical analysis.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A New Method for Optimization: Optimal Control;2024 36th Chinese Control and Decision Conference (CCDC);2024-05-25

2. A search strategy for publications in interdisciplinary research;El Profesional de la información;2023-10-13

3. Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization;Journal of Optimization Theory and Applications;2022-12-02

4. Scaling up stochastic gradient descent for non-convex optimisation;Machine Learning;2022-10-07

5. Supervised Machine Learning Approaches for Medical Data Classification;2022 2nd International Conference on Artificial Intelligence and Signal Processing (AISP);2022-02-12