1. Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 973–983 (2021)
2. Denk, T.I., Reisswig, C.: BERTgrid: contextualized embedding for 2D document representation and understanding. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)
3. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171–4186 (2019)
4. Lecture Notes in Computer Science;Ł Garncarek,2021
5. Gu, J., et al.: Unidoc: unified pretraining framework for document understanding. Adv. Neural. Inf. Process. Syst. 34, 39–50 (2021)