Coding for Large-Scale Distributed Machine Learning-Reference-Cited by-同舟云学术

Coding for Large-Scale Distributed Machine Learning

Published:2022-09-12 Issue:9 Volume:24 Page:1284
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Xiao Ming^ORCID,Skoglund Mikael^ORCID

Abstract

This article aims to give a comprehensive and rigorous review of the principles and recent development of coding for large-scale distributed machine learning (DML). With increasing data volumes and the pervasive deployment of sensors and computing machines, machine learning has become more distributed. Moreover, the involved computing nodes and data volumes for learning tasks have also increased significantly. For large-scale distributed learning systems, significant challenges have appeared in terms of delay, errors, efficiency, etc. To address the problems, various error-control or performance-boosting schemes have been proposed recently for different aspects, such as the duplication of computing nodes. More recently, error-control coding has been investigated for DML to improve reliability and efficiency. The benefits of coding for DML include high-efficiency, low complexity, etc. Despite the benefits and recent progress, however, there is still a lack of comprehensive survey on this topic, especially for large-scale learning. This paper seeks to introduce the theories and algorithms of coding for DML. For primal-based DML schemes, we first discuss the gradient coding with the optimal code distance. Then, we introduce random coding for gradient-based DML. For primal–dual-based DML, i.e., ADMM (alternating direction method of multipliers), we propose a separate coding method for two steps of distributed optimization. Then coding schemes for different steps are discussed. Finally, a few potential directions for future works are also given.

Funder

Swedish Research Council

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/9/1284/pdf

Reference47 articles.

1. A Survey on Large-Scale Machine Learning

2. MapReduce: Simplified data processing on large clusters;Dean;Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation,2004

3. Scaling Distributed Machine Learning with the Parameter Server;Li;Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI),2014

4. Speeding Up Distributed Machine Learning Using Codes

5. Federated Optimization: Distributed Machine Learning for On-Device Intelligence;Konecny;arXiv,2016

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DAS: A DRL-Based Scheme for Workload Allocation and Worker Selection in Distributed Coded Machine Learning;IEEE Internet of Things Journal;2024-08-01

2. Wireless Distributed Matrix-Vector Multiplication Using Over-the-Air Computation and Analog Coding;IEEE Transactions on Wireless Communications;2024-08

3. Information Theoretic Methods for Future Communication Systems;Entropy;2023-02-21

4. New Classification Method for Independent Data Sources Using Pawlak Conflict Model and Decision Trees;Entropy;2022-11-04