1. Opt: Open pre-trained transformer language models;Zhang;arXiv preprint arXiv:2205.01068,2022
2. Palm: Scaling language modeling with pathways;Chowdhery;arXiv preprint arXiv:2204.02311,2022
3. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020
4. Attention is all you need;Vaswani;Advances in neural information processing systems,2017
5. Bert: Pre-training of deep bidirectional transformers for language understanding;Devlin;arXiv preprint arXiv:1810.04805,2018