1. J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, in: Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. Vol. 1 (Long Short Pap., 2019: pp. 4171–4186.
2. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, Roberta: A robustly optimized bert pretraining approach, ArXiv Prepr. ArXiv1907.11692. (2019).
3. Xlnet: Generalized autoregressive pretraining for language understanding;Yang;Adv. Neural Inf. Process. Syst.,2019
4. I. Beltagy, M.E. Peters, A. Cohan, Longformer: The Long-Document Transformer, (2020). https://doi.org/10.48550/arxiv.2004.05150.
5. DTC: Transfer learning for commonsense machine comprehension;Han;Neurocomputing,2020