Author:
Dai He-Sen,Li Xiao-Hui,Yin Fei,Yan Xudong,Mei Shuqi,Liu Cheng-Lin
Publisher
Springer Nature Switzerland
Reference26 articles.
1. Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: Docformer: end-to-end transformer for document understanding. In: International Conference on Computer Vision, pp. 973–983 (2021)
2. Bao, H., Dong, L., Piao, S., Wei, F.: Beit: bert pre-training of image transformers. In: International Conference on Learning Representations (2021)
3. Chi, Z., et al.: InfoXLM: an information-theoretic framework for cross-lingual language model pre-training. In: 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3576–3588 (2021)
4. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186 (2019)
5. Gu, Z., et al.: Xylayoutlm: towards layout-aware multimodal networks for visually-rich document understanding. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4583–4592 (2022)