A snapshot of parallelism in distributed deep learning training-Reference-Cited by-同舟云学术

A snapshot of parallelism in distributed deep learning training

Published:2024-06-30 Issue:1 Volume:25 Page:60-73
ISSN:2539-2115
Container-title:Revista Colombiana de Computación
language:
Short-container-title:Rev. colomb. comput.

Author:

Romero-Sandí Hairol,Núñez Gabriel,Rojas Elvis

Abstract

The accelerated development of applications related to artificial intelligence has generated the creation of increasingly complex neural network models with enormous amounts of parameters, currently reaching up to trillions of parameters. Therefore, it makes your training almost impossible without the parallelization of training. Parallelism applied with different approaches is the mechanism that has been used to solve the problem of training on a large scale. This paper presents a glimpse of the state of the art related to parallelism in deep learning training from multiple points of view. The topics of pipeline parallelism, hybrid parallelism, mixture-of-experts and auto-parallelism are addressed in this study, which currently play a leading role in scientific research related to this area. Finally, we develop a series of experiments with data parallelism and model parallelism. The objective is that the reader can observe the performance of two types of parallelism and understand more clearly the approach of each one.

Publisher

Universidad Autonoma de Bucaramanga

Reference95 articles.

1. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Zheng, X. (2016, March 14). TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. arXiv(1603.04467 [cs.DC]). doi:10.48550/arXiv.1603.04467

2. Agarwal, S., Yan, C., Zhang, Z., & Venkataraman, S. (2023, October). BagPipe: Accelerating Deep Recommendation Model Training. SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles (SOSP '23) (pp. 348-363). Koblenz, Germany: Association for Computing Machinery, New York, NY, USA. doi:10.1145/3600006.3613142

3. Akintoye, S. B., Han, L., Zhang, X., Chen, H., & Zhang, D. (2022). A Hybrid Parallelization Approach for Distributed and Scalable Deep Learning. IEEE Access, 10, 77950-77961. doi:10.1109/ACCESS.2022.3193690

4. Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET) (pp. 1-6). Antalya, Turkey: IEEE. doi:10.1109/ICEngTechnol.2017.8308186

5. Aminabadi, R. Y., Rajbhandari, S., Awan, A. A., Li, C., Li, D., Zheng, E., . . . He, Y. (2022). DeepSpeed- Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale. SC22: International Conference for High Performance Computing, Networking, Storage and Analysis (pp. 1-15). Dallas, TX, USA: IEEE. doi:10.1109/SC41404.2022.00051