1. Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, a large-scale generative language model;smith,2022
2. PanGu-?: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation;zeng,2021
3. GPT-NeoX-20B: An Open-Source Autoregressive Language Model
4. Language models are few-shot learners;brown,2020