Dual visual align-cross attention-based image captioning transformer-Reference-Cited by-同舟云学术

Dual visual align-cross attention-based image captioning transformer

Published:2024-05-17 Issue: Volume: Page:
ISSN:1573-7721
Container-title:Multimedia Tools and Applications
language:en
Short-container-title:Multimed Tools Appl

Author:

Ren Yonggong,Zhang Jinghan,Xu Wenqiang,Lin Yuzhu,Fu Bo,Thanh Dang N. H.^ORCID

Funder

National Science Foundation of China

Liaoning Revitalization Talents Program

The Scientific Research Project of Liaoning Province（

Key R&D projects of Liaoning Provincial Department of Science and Technology

Liaoning Provincial Key Laboratory Special Fund

Đại học Kinh tế Thành phố Hồ Chí Minh

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s11042-024-19315-4.pdf

Reference46 articles.

1. Zhou L, Palangi H, Zhang L, Corso J, Gao J (2020) Unified vision-language pre-training for image captioning and vqa. Proc AAAI Confer Artif Intell 34(07):13041–13049

2. Hu X, Gan Z, Wang J, Yang Z, Liu Z, Lu Y, Wang L (2022) Scaling up vision-language pre-training for image captioning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. IEEE, pp 17980–17989

3. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, p 30

4. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:181004805

5. Luo Y, Ji J, Sun X, Cao L, Wu Y, Huang F, Lin C-W, Ji R (2021) Dual-level collaborative transformer for image captioning. Proc AAAI Confer Artif Intell 35(3):2286–2293