AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training-Reference-Cited by-同舟云学术

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

Published:2018-04-29 Issue:1 Volume:32 Page:
ISSN:2374-3468
Container-title:Proceedings of the AAAI Conference on Artificial Intelligence
language:
Short-container-title:AAAI

Author:

Chen Chia-Yu,Choi Jungwook,Brand Daniel,Agrawal Ankur,Zhang Wei,Gopalakrishnan Kailash

Abstract

Highly distributed training of Deep Neural Networks (DNNs) on future compute platforms (offering 100 of TeraOps/s of computational capacity) is expected to be severely communication constrained. To overcome this limitation, new gradient compression techniques are needed that are computationally friendly, applicable to a wide variety of layers seen in Deep Neural Networks and adaptable to variations in network architectures as well as their hyper-parameters. In this paper we introduce a novel technique - the Adaptive Residual Gradient Compression (AdaComp) scheme. AdaComp is based on localized selection of gradient residues and automatically tunes the compression rate depending on local activity. We show excellent results on a wide spectrum of state of the art Deep Learning models in multiple domains (vision, speech, language), datasets (MNIST, CIFAR10, ImageNet, BN50, Shakespeare), optimizers (SGD with momentum, Adam) and network parameters (number of learners, minibatch-size etc.). Exploiting both sparsity and quantization, we demonstrate end-to-end compression rates of ∼200× for fully-connected and recurrent layers, and ∼40× for convolutional layers, without any noticeable degradation in model accuracies.

Publisher

Association for the Advancement of Artificial Intelligence (AAAI)

Subject

General Medicine

Cited by 39 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Communication-Efficient and Privacy-Preserving Aggregation in Federated Learning With Adaptability;IEEE Internet of Things Journal;2024-08-01

2. FedSZ: Leveraging Error-Bounded Lossy Compression for Federated Learning Communications;2024 IEEE 44th International Conference on Distributed Computing Systems (ICDCS);2024-07-23

3. Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration;IEEE Journal of Selected Topics in Signal Processing;2024-04

4. Smart-Infinity: Fast Large Language Model Training using Near-Storage Processing on a Real System;2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2024-03-02

5. Distributed Analytics For Big Data: A Survey;Neurocomputing;2024-03