1. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. CoRR abs/1911.02116 (2019). http://arxiv.org/abs/1911.02116
2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Long and Short Papers), vol. 1, pp. 4171–4186. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
3. Ghosal, D., Majumder, N., Mihalcea, R., Poria, S.: Two is better than many? Binary classification as an effective approach to multi-choice question answering (2022)
4. He, P., Liu, X., Gao, J., Chen, W.: DeBERTa: decoding-enhanced BERT with disentangled attention. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=XPZIaotutsD
5. Huang, L., Bras, R.L., Bhagavatula, C., Choi, Y.: COSMOS QA: machine reading comprehension with contextual commonsense reasoning. CoRR abs/1909.00277 (2019). http://arxiv.org/abs/1909.00277