1. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019, pp. 4171–4186. URL: https://aclanthology.org/N19-1423. 10.18653/v1/N19-1423.
2. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, arXiv preprint arXiv:1907.11692 (2019).
3. Language models are few-shot learners;Brown;Advances in neural information processing systems,2020
4. B. Liétard, M. Abdou, A. Søgaard, Do language models know the way to Rome?, in: Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Association for Computational Linguistics, Punta Cana, Dominican Republic, 2021, pp. 510–517. URL: https://aclanthology.org/2021.blackboxnlp-1.40. 10.18653/v1/2021.blackboxnlp-1.40.
5. Spanbert: Improving pre-training by representing and predicting spans, Transactions of the Association for;Joshi;Computational Linguistics,2020