DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction-Reference-Cited by-同舟云学术

DisSAGD: A Distributed Parameter Update Scheme Based on Variance Reduction

Published:2021-07-28 Issue:15 Volume:21 Page:5124
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Pan Haijie^ORCID,Zheng Lirong

Abstract

Machine learning models often converge slowly and are unstable due to the significant variance of random data when using a sample estimate gradient in SGD. To increase the speed of convergence and improve stability, a distributed SGD algorithm based on variance reduction, named DisSAGD, is proposed in this study. DisSAGD corrects the gradient estimate for each iteration by using the gradient variance of historical iterations without full gradient computation or additional storage, i.e., it reduces the mean variance of historical gradients in order to reduce the error in updating parameters. We implemented DisSAGD in distributed clusters in order to train a machine learning model by sharing parameters among nodes using an asynchronous communication protocol. We also propose an adaptive learning rate strategy, as well as a sampling strategy, to address the update lag of the overall parameter distribution, which helps to improve the convergence speed when the parameters deviate from the optimal value—when one working node is faster than another, this node will have more time to compute the local gradient and sample more samples for the next iteration. Our experiments demonstrate that DisSAGD significantly reduces waiting times during loop iterations and improves convergence speed when compared to traditional methods, and that our method can achieve speed increases for distributed clusters.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/15/5124/pdf

Reference45 articles.

1. The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization

2. A distributed stochastic gradient algorithm for economic dispatch over directed network with communication delays

3. Database Meets Deep Learning

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Sarung Tangan Pemeriksa Kesehatan Ayam Pedaging (SASETAN) Terintegrasi dengan Teknologi Arduino Uno;International Journal of Natural Science and Engineering;2023-07-25

2. Blind Detection of Broadband Signal Based on Weighted Bi-Directional Feature Pyramid Network;Sensors;2023-01-30

3. N-SVRG: Stochastic Variance Reduction Gradient with Noise Reduction Ability for Small Batch Samples;Computer Modeling in Engineering & Sciences;2022