Modeling the Training Iteration Time for Heterogeneous Distributed Deep Learning Systems-Reference-Cited by-同舟云学术

Modeling the Training Iteration Time for Heterogeneous Distributed Deep Learning Systems

Published:2023-02-21 Issue: Volume:2023 Page:1-15
ISSN:1098-111X
Container-title:International Journal of Intelligent Systems
language:en
Short-container-title:International Journal of Intelligent Systems

Author:

Zeng Yifu¹²^ORCID,Chen Bowei²,Pan Pulin²,Li Kenli²,Chen Guo²^ORCID

Affiliation:

1. College of Computer Science and Engineering, Changsha University, Changsha 410022, Hunan, China

2. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China

Abstract

Distributed deep learning systems effectively respond to the increasing demand for large-scale data processing in recent years. However, the significant investment in building distributed learning systems with powerful computing nodes places a huge financial burden on developers and researchers. It will be good to predict the precise benefit, i.e., how many times of speedup it can get compared with training on single machine (or a few), before actually building such big learning systems. To address this problem, this paper presents a novel performance model on training iteration time for heterogeneous distributed deep learning systems based on the characteristics of the parameter server (PS) system with bulk synchronous parallel (BSP) synchronization style. The accuracy of our performance model is demonstrated by comparing real measurement results on TensorFlow when training different neural networks with various kinds of hardware testbeds: the prediction accuracy is higher than 90% in most cases.

Funder

National Basic Research Program of China

Publisher

Hindawi Limited

Subject

Artificial Intelligence,Human-Computer Interaction,Theoretical Computer Science,Software

Link

http://downloads.hindawi.com/journals/ijis/2023/2663115.pdf

Reference29 articles.

1. Automated size-specific dose estimates using deep learning image processing

2. Multi-channel spectrograms for speech processing applications using deep learning methods

3. A Survey of the Usages of Deep Learning for Natural Language Processing

4. Visualization of deep reinforcement learning using grad-cam: how ai plays atari games?;H. T. Joo

5. Deep Learning-Based Autonomous Driving Systems: A Survey of Attacks and Defenses