Publisher
Springer Nature Switzerland
Reference22 articles.
1. Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: Docformer: end-to-end transformer for document understanding. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 993–1003 (2021)
2. Appalaraju, S., Tang, P., Dong, Q., Sankaran, N., Zhou, Y., Manmatha, R.: Docformerv2: local features for document understanding. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, pp. 709–718 (2024)
3. Baechler, G., et al.: ScreenAI: a vision-language model for UI and infographics understanding. arXiv preprint arXiv:2402.04615 (2024)
4. Beltagy, I., Peters, M.E., Cohan, A.: Longformer: the long-document transformer (2020)
5. Blau, T., et al.: Gram: global reasoning for multi-page VQA. arXiv preprint arXiv:2401.03411 (2024)