Proactive Congestion Avoidance for Distributed Deep Learning-Reference-Cited by-同舟云学术

Proactive Congestion Avoidance for Distributed Deep Learning

Published:2020-12-29 Issue:1 Volume:21 Page:174
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Kang Minkoo^ORCID,Yang Gyeongsik^ORCID,Yoo Yeonho^ORCID,Yoo Chuck^ORCID

Abstract

This paper presents “Proactive Congestion Notification” (PCN), a congestion-avoidance technique for distributed deep learning (DDL). DDL is widely used to scale out and accelerate deep neural network training. In DDL, each worker trains a copy of the deep learning model with different training inputs and synchronizes the model gradients at the end of each iteration. However, it is well known that the network communication for synchronizing model parameters is the main bottleneck in DDL. Our key observation is that the DDL architecture makes each worker generate burst traffic every iteration, which causes network congestion and in turn degrades the throughput of DDL traffic. Based on this observation, the key idea behind PCN is to prevent potential congestion by proactively regulating the switch queue length before DDL burst traffic arrives at the switch, which prepares the switches for handling incoming DDL bursts. In our evaluation, PCN improves the throughput of DDL traffic by 72% on average.

Funder

National Research Foundation of Korea

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/21/1/174/pdf

Reference45 articles.

1. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems;Chen;arXiv,2015

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning controller for data rate management in science DMZ networks;Computer Networks;2024-04

2. Exploring Wireless Sensing Technologies and Their Applications in the Dawn of 6G;Highlights in Science, Engineering and Technology;2023-10-09

3. Performance Analysis of Software-Defined Networks to Mitigate Private VLAN Attacks;Sensors;2023-02-04

4. Study on Identification and Prevention of Traffic Congestion Zones Considering Resilience-Vulnerability of Urban Transportation Systems;Sustainability;2022-12-16

5. A survey on TCP enhancements using P4-programmable devices;Computer Networks;2022-07