1. Scaling laws for neural language models;Kaplan;arXiv preprint,2020
2. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020
3. Bloom: A 176b-parameter open-access multilingual language model;Workshop;arXiv preprint,2022
4. Opt: Open pre-trained transformer language models;Zhang;arXiv preprint,2022
5. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model;Smith;arXiv preprint,2022