Quantized Adam with Error Feedback-Reference-Cited by-同舟云学术

Quantized Adam with Error Feedback

Published:2021-10-31 Issue:5 Volume:12 Page:1-26
ISSN:2157-6904
Container-title:ACM Transactions on Intelligent Systems and Technology
language:en
Short-container-title:ACM Trans. Intell. Syst. Technol.

Author:

Chen Congliang¹,Shen Li²,Huang Haozhi³,Liu Wei⁴

Affiliation:

1. The Chinese University of Hong Kong, Shenzhen, Guangdong, China

2. JD Explore Academy, Beijing, China

3. Tencent AI Lab, Guangdong, China

4. Tencent, Guangdong, China

Abstract

In this article, we present a distributed variant of an adaptive stochastic gradient method for training deep neural networks in the parameter-server model. To reduce the communication cost among the workers and server, we incorporate two types of quantization schemes, i.e., gradient quantization and weight quantization, into the proposed distributed Adam. In addition, to reduce the bias introduced by quantization operations, we propose an error-feedback technique to compensate for the quantized gradient. Theoretically, in the stochastic nonconvex setting, we show that the distributed adaptive gradient method with gradient quantization and error feedback converges to the first-order stationary point, and that the distributed adaptive gradient method with weight quantization and error feedback converges to the point related to the quantized level under both the single-worker and multi-worker modes. Last, we apply the proposed distributed adaptive gradient methods to train deep neural networks. Experimental results demonstrate the efficacy of our methods.

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Theoretical Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/3470890

Reference48 articles.

1. Amitabh Basu Soham De Anirbit Mukherjee and Enayat Ullah. 2018. Convergence guarantees for RMSProp and Adam in non-convex optimization and an empirical comparison to Nesterov acceleration. arXiv:1807.06766. Amitabh Basu Soham De Anirbit Mukherjee and Enayat Ullah. 2018. Convergence guarantees for RMSProp and Adam in non-convex optimization and an empirical comparison to Nesterov acceleration. arXiv:1807.06766.

2. Xiangyi Chen Sijia Liu Ruoyu Sun and Mingyi Hong. 2018. On the convergence of a class of Adam-type algorithms for non-convex optimization. arXiv:1808.02941. Xiangyi Chen Sijia Liu Ruoyu Sun and Mingyi Hong. 2018. On the convergence of a class of Adam-type algorithms for non-convex optimization. arXiv:1808.02941.

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A novel optical orthogonality control approach based on noise propagation and its application in spin-exchange relaxation-free co-magnetometers;Measurement;2025-01

2. AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks;Neural Networks;2024-01

3. Efficient Federated Learning Via Local Adaptive Amended Optimizer With Linear Speedup;IEEE Transactions on Pattern Analysis and Machine Intelligence;2023-12

4. Enhancing Communication Efficiency in Adam Optimizer for Distributed Deep Learning;2023 10th International Conference on Advanced Informatics: Concept, Theory and Application (ICAICTA);2023-10-07

5. Communication-Efficient Nonconvex Federated Learning With Error Feedback for Uplink and Downlink;IEEE Transactions on Neural Networks and Learning Systems;2023