1. Aghajanyan A, Shrivastava A, Gupta A, et al (2020) Better fine-tuning by reducing representational collapse. CoRR. arXiv:2008.03156
2. Artetxe M, Ruder S, Yogatama D (2020) On the cross-lingual transferability of monolingual representations. In: Jurafsky D, Chai J, Schluter N, et al (eds) Proceedings of the 58th annual meeting of the association for computational linguistics, ACL 2020, Online, July 5–10, 2020. Association for Computational Linguistics, pp 4623–4637. https://www.aclweb.org/anthology/2020.acl-main.421/
3. Athiwaratkun B, Finzi M, Izmailov P, et al (2019) There are many consistent explanations of unlabeled data: why you should average. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9. OpenReview.net, https://openreview.net/forum?id=rkgKBhA5Y7
4. Carmon Y, Raghunathan A, Schmidt L, et al (2019) Unlabeled data improves adversarial robustness. In: Wallach HM, Larochelle H, Beygelzimer A, et al (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, 8–14 December 2019, Vancouver, BC, Canada, pp 11190–11201. http://papers.nips.cc/paper/9298-unlabeled-data-improves-adversarial-robustness
5. Chi Z, Dong L, Wei F, et al (2020) InfoXLM: an information-theoretic framework for cross-lingual language model pre-training. CoRR. arXiv:2007.07834