Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency-Reference-Cited by-同舟云学术

Hanayo: Harnessing Wave-like Pipeline Parallelism for Enhanced Large Model Training Efficiency

Published:2023-11-11 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
language:
Short-container-title:

Author:

Liu Ziming¹^ORCID,Cheng Shenggan¹^ORCID,Zhou Haotian¹^ORCID,You Yang¹^ORCID

Affiliation:

1. School of Computing, National University of Singapore, Singapore, Singapore

Funder

National University of Singapore

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3581784.3607073

Reference41 articles.

1. Mandeep Baines Shruti Bhosale Vittorio Caggiano Naman Goyal Siddharth Goyal Myle Ott Benjamin Lefaudeux Vitaliy Liptchinsky Mike Rabbat Sam Sheiffer Anjali Sridhar and Min Xu. 2021. FairScale: A general purpose modular PyTorch library for high performance and large scale training. https://github.com/facebookresearch/fairscale. Mandeep Baines Shruti Bhosale Vittorio Caggiano Naman Goyal Siddharth Goyal Myle Ott Benjamin Lefaudeux Vitaliy Liptchinsky Mike Rabbat Sam Sheiffer Anjali Sridhar and Min Xu. 2021. FairScale: A general purpose modular PyTorch library for high performance and large scale training. https://github.com/facebookresearch/fairscale.

2. Zhengda Bian , Hongxin Liu , Boxiang Wang , Haichen Huang , Yongbin Li , Chuanrui Wang , Fan Cui , and Yang You . 2021. Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. arXiv preprint arXiv:2110.14883 ( 2021 ). Zhengda Bian, Hongxin Liu, Boxiang Wang, Haichen Huang, Yongbin Li, Chuanrui Wang, Fan Cui, and Yang You. 2021. Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. arXiv preprint arXiv:2110.14883 (2021).

3. Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell etal 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901. Tom Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared D Kaplan Prafulla Dhariwal Arvind Neelakantan Pranav Shyam Girish Sastry Amanda Askell et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020) 1877--1901.

4. Tianqi Chen , Bing Xu , Chiyuan Zhang , and Carlos Guestrin . 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 ( 2016 ). Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).

5. Jacob Devlin , Ming-Wei Chang , Kenton Lee , and Kristina Toutanova . 2018 . Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Feasibility Study of Edge Computing Empowered by Artificial Intelligence—A Quantitative Analysis Based on Large Models;Big Data and Cognitive Computing;2024-08-19

2. AdaPipe: Optimizing Pipeline Parallelism with Adaptive Recomputation and Partitioning;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3;2024-04-27

3. FastFold: Optimizing AlphaFold Training and Inference on GPU Clusters;Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming;2024-02-20