1. Language models are few-shot learners;T Brown;Advances in neural information processing systems,2020
2. Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding2018 October 01, 2018:[arXiv:1810.04805 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2018arXiv181004805D.
3. Language models are unsupervised multitask learners.;A Radford;OpenAI blog,2019
4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention Is All You Need2017 June 01, 2017:[arXiv:1706.03762 p.]. Available from: https://ui.adsabs.harvard.edu/abs/2017arXiv170603762V.