1. Bert: Pre-training of deep bidirectional transformers for language understanding;Kenton
2. Language models are unsupervised multitask learners;Radford;OpenAI blog,2019
3. Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model;Smith,2022
4. Structured Pruning of Large Language Models