Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning-Reference-Cited by-同舟云学术

Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning

Published:2016-09-25 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 23rd European MPI Users' Group Meeting
language:
Short-container-title:

Author:

Awan A. A.¹,Hamidouche K.¹,Venkatesh A.¹,Panda D. K.¹

Affiliation:

1. Dept of Computer Science and Engineering, The Ohio State University

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/2966884.2966912

Reference26 articles.

1. KESCH : Cray CS-Storm System (CSCS). http://www.cscs.ch/computers/kesch_escha/index.html. KESCH: Cray CS-Storm System (CSCS). http://www.cscs.ch/computers/kesch_escha/index.html.

2. CNTK. http://www.cntk.ai/ 2015. CNTK. http://www.cntk.ai/ 2015.

3. Amazon. Deep Scalable Sparse Tensor Network Engine. https://github.com/amznlabs/amazon-dsstne 2016. Amazon. Deep Scalable Sparse Tensor Network Engine. https://github.com/amznlabs/amazon-dsstne 2016.

4. Interprocessor collective communication library (InterCom)

5. D. Bureddy H. Wang A. Venkatesh S. Potluri and D. Panda . OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters. In J. Träff S. Benkner and J. Dongarra editors Recent Advances in the Message Passing Interface volume 7490 of Lecture Notes in Computer Science pages 110 -- 120 . Springer Berlin Heidelberg 2012 . ISBN 978-3-642-33517-4. URL http://dx.doi.org/10.1007/978-3-642-33518-1_16. 10.1007/978-3-642-33518-1_16 D. Bureddy H. Wang A. Venkatesh S. Potluri and D. Panda. OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters. In J. Träff S. Benkner and J. Dongarra editors Recent Advances in the Message Passing Interface volume 7490 of Lecture Notes in Computer Science pages 110--120. Springer Berlin Heidelberg 2012. ISBN 978-3-642-33517-4. URL http://dx.doi.org/10.1007/978-3-642-33518-1_16. 10.1007/978-3-642-33518-1_16

Cited by 33 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Fast and scalable all-optical network architecture for distributed deep learning;Journal of Optical Communications and Networking;2024-02-22

2. Distributed out-of-memory NMF on CPU/GPU architectures;The Journal of Supercomputing;2023-09-08

3. DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining;2023 IEEE 43rd International Conference on Distributed Computing Systems (ICDCS);2023-07

4. FMI: Fast and Cheap Message Passing for Serverless Functions;Proceedings of the 37th International Conference on Supercomputing;2023-06-21

5. Lyra: Elastic Scheduling for Deep Learning Clusters;Proceedings of the Eighteenth European Conference on Computer Systems;2023-05-08