Author:
Wang Zhen,Zhu Peide,Yu Fuyang,Okumura Manabu
Publisher
Springer Nature Switzerland
Reference20 articles.
1. Antol, S., et al.: VQA: visual question answering. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2425–2433 (2015)
2. Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding (2021)
3. Bao, H., Dong, L., Wei, F.: BEiT: BERT pre-training of image transformers. arXiv preprint: arXiv:2106.08254 (2021)
4. Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. IJDAR 10(1), 1–16 (2007)
5. Chen, X., et al.: Context autoencoder for self-supervised representation learning. arXiv preprint: arXiv:2202.03026 (2022)