CoTrain: Efficient Scheduling for Large-Model Training upon GPU and CPU in Parallel-Reference-Cited by-同舟云学术

CoTrain: Efficient Scheduling for Large-Model Training upon GPU and CPU in Parallel

Published:2023-08-07 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 52nd International Conference on Parallel Processing
language:
Short-container-title:

Author:

Li Zhenxing¹^ORCID,Cao Qiang¹^ORCID,Chen Yajie²^ORCID,Yan Wenrui³^ORCID

Affiliation:

1. Huazhong University of Science and Technology, China

2. Nanjing University of Science and Technology, China

3. Shanghai AI Laboratory, China

Funder

Key Research and Development Project of Hubei

National Key Research and Development Program of China

NSFC

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3605573.3605647

Reference32 articles.

1. Martín Abadi , Paul Barham , Jianmin Chen , Zhifeng Chen , Andy Davis , Jeffrey Dean , Matthieu Devin , Sanjay Ghemawat , Geoffrey Irving , Michael Isard , 2016. Tensorflow: a system for large-scale machine learning .. In Osdi, Vol. 16. Savannah, GA, USA , 265–283. Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, 2016. Tensorflow: a system for large-scale machine learning.. In Osdi, Vol. 16. Savannah, GA, USA, 265–283.

2. Jonghyun Bae Jongsung Lee Yunho Jin Sam Son Shine Kim Hakbeom Jang Tae Jun Ham and Jae W Lee. 2021. FlashNeuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks.. In FAST. 387–401. Jonghyun Bae Jongsung Lee Yunho Jin Sam Son Shine Kim Hakbeom Jang Tae Jun Ham and Jae W Lee. 2021. FlashNeuron: SSD-Enabled Large-Batch Training of Very Deep Neural Networks.. In FAST. 387–401.

3. Large-Scale Machine Learning with Stochastic Gradient Descent

4. Tom Brown , Benjamin Mann , Nick Ryder , Melanie Subbiah , Jared D Kaplan , Prafulla Dhariwal , Arvind Neelakantan , Pranav Shyam , Girish Sastry , Amanda Askell , 2020. Language models are few-shot learners. Advances in neural information processing systems 33 ( 2020 ), 1877–1901. Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.

5. Yu Cao , Wei Bi , Meng Fang , and Dacheng Tao . 2020. Pretrained language models for dialogue generation with multiple input sources. arXiv preprint arXiv:2010.07576 ( 2020 ). Yu Cao, Wei Bi, Meng Fang, and Dacheng Tao. 2020. Pretrained language models for dialogue generation with multiple input sources. arXiv preprint arXiv:2010.07576 (2020).

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Large models for intelligent transportation systems and autonomous vehicles: A survey;Advanced Engineering Informatics;2024-10

2. Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers;Proceedings of the 14th Workshop on AI and Scientific Computing at Scale using Flexible Computing Infrastructures;2024-06-03