GRPIC: an end-to-end image captioning model using three visual features-Reference-Cited by-同舟云学术

GRPIC: an end-to-end image captioning model using three visual features

Published:2024-09-04 Issue: Volume: Page:
ISSN:1868-8071
Container-title:International Journal of Machine Learning and Cybernetics
language:en
Short-container-title:Int. J. Mach. Learn. & Cyber.

Author:

Peng Shixin,Xiong Can,Liu Leyuan,Yang Laurence T.,Chen Jingying

Funder

Hubei Provincial Natural Science Foundation of China

Research Funds of CCNU from the Colleges' Basic Research and Operation of MOE

National Key Research and Development Program of China

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s13042-024-02352-8.pdf

Reference50 articles.

1. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2015.7298935

2. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning,International Conference on Machine Learning

3. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A, Kaiser L, Polosukhin I (2017) Attention is all you need. Neural Information Processing Systems, Neural Information Processing Systems

5. Cornia M, Stefanini M, Baraldi L, Cucchiara R (2020) Meshed-memory transformer for image captioning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr42600.2020.01059