1. Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). Association for Computational Linguistics, Minneapolis, pp 4171–4186. https://aclanthology.org/N19-1423
2. Conneau A, Wu S, Li H, Zettlemoyer L, Stoyanov V (2020) Emerging cross-lingual structure in pretrained language models. In: Proceedings of the 58th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 6022–6034. https://www.aclweb.org/anthology/2020.acl-main.536
3. Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 21(140):1–67
4. Raina R, Battle A, Lee H, Packer B, Ng AY (2007) Self-taught learning: Transfer learning from unlabeled data. In: Proceedings of the ICML
5. Rapp R (1995) Identifying word translations in non-parallel texts. In: Proceedings of the ACL. Cambridge, pp 320–322