1. Brown, T.B., et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020)
2. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
3. Holeček, M.: Learning from similarity and information extraction from structured documents. Inter. J. Document Anal. Recog. (IJDAR) 24(3), 149–165 (2021)
4. Lan, Z., Chen, M., Goodman, Gimpel, K., Sharma, P., Soricut, R.: Albert: a lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
5. Liu, F., Jiao, Y., Massiah, J., Yilmaz, E., Havrylov, S.: Trans-encoder: unsupervised sentence- pair modelling through self-and mutual-distillations. arXiv preprint arXiv:2109.13059 (2021)