HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow-Reference-Cited by-同舟云学术

HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow

Published:2020 Issue: Volume: Page:83-103
ISSN:0302-9743
Container-title:Lecture Notes in Computer Science
language:
Short-container-title:

Author:

Awan Ammar Ahmad,Jain Arpan,Anthony Quentin,Subramoni Hari,Panda Dhabaleswar K.

Publisher

Springer International Publishing

Link

http://link.springer.com/content/pdf/10.1007/978-3-030-50743-5_5

Reference21 articles.

1. Keras (2019). https://keras.io/

2. Model parallelism in MXNet (2019). https://mxnet.apache.org/api/faq/model_parallel_lstm

3. Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training resnet-50 on ImageNet in 15 minutes (2017). CoRR abs/1711.04325. http://arxiv.org/abs/1711.04325

4. Awan, A.A., Chu, C., Subramoni, H., Lu, X., Panda, D.K.: OC-DNN: exploiting advanced unified memory capabilities in CUDA 9 and volta GPUs for out-of-core DNN training. In: 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pp. 143–152, December 2018. https://doi.org/10.1109/HiPC.2018.00024

5. Awan, A.A., Hamidouche, K., Hashmi, J.M., Panda, D.K.: S-Caffe: co-designing MPI runtimes and caffe for scalable deep learning on modern GPU Clusters. In: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming PPoPP 2017, pp. 193–205. ACM, New York (2017). https://doi.org/10.1145/3018743.3018769

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. AshPipe: Asynchronous Hybrid Pipeline Parallel for DNN Training;Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region;2024-01-18

2. Deep Randomized Networks for Fast Learning;Lecture Notes in Computer Science;2023

3. FuncPipe: A Pipelined Serverless Framework for Fast and Cost-Efficient Training of Deep Learning Models;Proceedings of the ACM on Measurement and Analysis of Computing Systems;2022-12

4. Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters;Lecture Notes in Computer Science;2022

5. Data-driven global weather predictions at high resolutions;The International Journal of High Performance Computing Applications;2021-08-18