A bidirectional DNN partition mechanism for efficient pipeline parallel training in cloud-Reference-Cited by-同舟云学术

A bidirectional DNN partition mechanism for efficient pipeline parallel training in cloud

Published:2023-02-16 Issue:1 Volume:12 Page:
ISSN:2192-113X
Container-title:Journal of Cloud Computing
language:en
Short-container-title:J Cloud Comp

Author:

Cui Lingyun,Qu Zhihao,Zhang Guomin,Tang Bin,Ye Baoliu

Abstract

AbstractRecently, deep neural networks (DNNs) have shown great promise in many fields while their parameter sizes are rapidly expanding. To break through the computation and memory limitation of a single machine, pipeline model parallelism is proposed for large-scale DNN training by fully utilizing the computation and storage power of the distributed cluster. Cloud data centers can also provide sufficient computing, storage and bandwidth resources. However, most existing approaches apply layer-wise partitioning, which is difficult to obtain an even model partition result because of the large computational overhead discrepancy between DNN layers, resulting in degraded efficiency. To tackle this issue, we propose “Bi-Partition”, a novel partitioning method based on bidirectional partitioning for forward propagation (FP) and backward propagation (BP), which improves the efficiency of the pipeline model parallelism system. By deliberated designing distinct cut positions for FP and BP of DNN training, workers in the pipeline get nearly equal computational loads, and the balanced pipeline fully utilizes the computing resources. Experiments on various DNN models and datasets validate the efficiency of our mechanism, e.g., the training efficiency achieving up to 1.9

$$\times$$

× faster than the state-of-the-art method PipeDream.

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Software

Link

https://link.springer.com/content/pdf/10.1186/s13677-022-00382-7.pdf

Reference32 articles.

1. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In: Proc of the ICLR. OpenReview.net, Austria

2. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: Hierarchical vision transformer using shifted windows. In: Proc. of the IEEE/CVF ICCV. IEEE, Montreal, BC, Canada, p 10012–10022

3. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proc of the NeurIPS, vol 30. Curran Associates Inc.57, Long Beach, CA, USA

4. Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, et al (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. preprint ArXiv:1609.08144

5. Shoeybi M, Patwary M, Puri R, LeGresley P, Casper J, Catanzaro B (2019) Megatron-LM: Training multi-billion parameter language models using model parallelism. preprint ArXiv:1909.08053

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Hyperspectral Image Analysis Using Cloud-Based Support Vector Machines;SN Computer Science;2024-07-24

2. SmartPipe: Intelligently Freezing Layers in Pipeline Parallelism for Distributed DNN Training;2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS);2023-12-17

3. Pipeline Parallelism with Reduced Network Communications for Efficient Compute-intensive Neural Network Training;2023-11-20