Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks-Reference-Cited by-同舟云学术

Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks

Published:2023-06-08 Issue:1 Volume:10 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Aach Marcel,Inanc Eray,Sarma Rakesh,Riedel Morris,Lintermann Andreas

Abstract

AbstractContinuously increasing data volumes from multiple sources, such as simulation and experimental measurements, demand efficient algorithms for an analysis within a realistic timeframe. Deep learning models have proven to be capable of understanding and analyzing large quantities of data with high accuracy. However, training them on massive datasets remains a challenge and requires distributed learning exploiting High-Performance Computing systems. This study presents a comprehensive analysis and comparison of three well-established distributed deep learning frameworks—Horovod, DeepSpeed, and Distributed Data Parallel by PyTorch—with a focus on their runtime performance and scalability. Additionally, the performance of two data loaders, the native PyTorch data loader and the DALI data loader by NVIDIA, is investigated. To evaluate these frameworks and data loaders, three standard ResNet architectures with 50, 101, and 152 layers are tested using the ImageNet dataset. The impact of different learning rate schedulers on validation accuracy is also assessed. The novel contribution lies in the detailed analysis and comparison of these frameworks and data loaders on the state-of-the-art Jülich Wizard for European Leadership Science (JUWELS) Booster system at the Jülich Supercomputing Centre, using up to 1024 A100 NVIDIA GPUs in parallel. Findings show that the DALI data loader significantly reduces the overall runtime of ResNet50 from more than 12 h on 4 GPUs to less than 200 s on 1024 GPUs. The outcomes of this work highlight the potential impact of distributed deep learning using efficient tools on accelerating scientific discoveries and data-driven applications. Graphical Abstract

Funder

Horizon 2020

Forschungszentrum Jülich GmbH

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

https://link.springer.com/content/pdf/10.1186/s40537-023-00765-w.pdf

Reference50 articles.

1. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016;pp. 770–778. https://doi.org/10.1109/CVPR.2016.90

2. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. In: International Conference on Learning Representations (2021). arxiv:2010.11929

3. Ben-Nun T, Hoefler T. Demystifying parallel and distributed deep learning: an in-depth concurrency analysis. ACM Comput Surv. 2019. https://doi.org/10.1145/3320060.

4. Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour 2018. arXiv:1706.02677

5. Sergeev A, Balso M.D. Horovod: fast and easy distributed deep learning in TensorFlow 2018. arXiv:1802.05799

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative Profiling;Proceedings of the 4th Workshop on Machine Learning and Systems;2024-04-22