Heterogeneous gradient computing optimization for scalable deep neural networks-Reference-Cited by-同舟云学术

Heterogeneous gradient computing optimization for scalable deep neural networks

Published:2022-03-19 Issue:11 Volume:78 Page:13455-13469
ISSN:0920-8542
Container-title:The Journal of Supercomputing
language:en
Short-container-title:J Supercomput

Author:

Moreno-Álvarez Sergio^ORCID,Paoletti Mercedes E.,Rico-Gallego Juan A.,Haut Juan M.

Abstract

AbstractNowadays, data processing applications based on neural networks cope with the growth in the amount of data to be processed and with the increase in both the depth and complexity of the neural networks architectures, and hence in the number of parameters to be learned. High-performance computing platforms are provided with fast computing resources, including multi-core processors and graphical processing units, to manage such computational burden of deep neural network applications. A common optimization technique is to distribute the workload between the processes deployed on the resources of the platform. This approach is known as data-parallelism. Each process, known as replica, trains its own copy of the model on a disjoint data partition. Nevertheless, the heterogeneity of the computational resources composing the platform requires to unevenly distribute the workload between the replicas according to its computational capabilities, to optimize the overall execution performance. Since the amount of data to be processed is different in each replica, the influence of the gradients computed by the replicas in the global parameter updating should be different. This work proposes a modification of the gradient computation method that considers the different speeds of the replicas, and hence, its amount of data assigned. The experimental results have been conducted on heterogeneous high-performance computing platforms for a wide range of models and datasets, showing an improvement in the final accuracy with respect to current techniques, with a comparable performance.

Funder

Horizon 2020

Consejería de Educación y Empleo, Junta de Extremadura

Ministerio de Ciencia, Innovación y Universidades

Universidad de Extremadura

Publisher

Springer Science and Business Media LLC

Subject

Hardware and Architecture,Information Systems,Theoretical Computer Science,Software

Link

https://link.springer.com/content/pdf/10.1007/s11227-022-04399-2.pdf

Reference32 articles.

1. Alistarh D, Grubic D, Li J, Tomioka R, Vojnovic M (2017) QSGD: communication-efficient SGD via gradient quantization and encoding. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 1709–1720

2. Ben-Nun T, Hoefler T (2018) Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. arXiv:1802.09941

3. Byrd J, Lipton Z (2019) What is the effect of importance weighting in deep learning? In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference Machine Learning, P. Machine Learning Research, vol. 97. PMLR, pp 872–881

4. Chang HS, Learned-Miller EG, McCallum A (2017) Active bias: training more accurate neural networks by emphasizing high variance samples. In: NIPS

5. Chen C, Weng Q, Wang W, Li B, Li B (2020) Semi-dynamic load balancing. In: Proceedings of the 11th ACM symposium on cloud computing. https://doi.org/10.1145/3419111.3421299

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hyperspectral Image Analysis Using Cloud-Based Support Vector Machines;SN Computer Science;2024-07-24

2. A survey of compute nodes with 100 TFLOPS and beyond for supercomputers;CCF Transactions on High Performance Computing;2024-05-23