1. Devlin, J. , Chang, M.-W. , Lee, K. and Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186.
2. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
3. Fine-tuning pretrained language models: weight initializations, data orders, and early stopping;Dodge;CoRR,2020
4. Universal Language Model Fine-tuning for Text Classification
5. DeBERTaV3: Improving DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing;He;CoRR,2021